Articles

10.3: The Matrix Exponential as a Sum of Powers - Mathematics


You may recall from Calculus that for any numbers aa and tt one may achieve eatea⁢t via

[e^{at} = sum_{k=0}^{infty} frac{(at)^k}{k!} onumber]

The natural matrix definition is therefore

[e^{At} = sum_{k=0}^{infty} frac{(At)^k}{k!} onumber]

where (A^{0} = I) is the n-by-n identity matrix.

Example (PageIndex{1})

The easiest case is the diagonal case, e.g.,

[A = egin{pmatrix} {1}&{0} {0}&{2} end{pmatrix} onumber]

for then

[(At)^k = egin{pmatrix} {t^k}&{0} {0}&{(2t)^k} end{pmatrix} onumber]

and so

[e^{At} = egin{pmatrix} {e^t}&{0} {0}&{e^{2t}} end{pmatrix} onumber]

Note that this is NOT the exponential of each element of (A).

Example (PageIndex{2})

As a second example let us suppose

[A = egin{pmatrix} {0}&{1} {-1}&{0} end{pmatrix} onumber]

We recognize that its powers cycle, i.e.,

[A^2 = egin{pmatrix} {-1}&{0} {0}&{-1} end{pmatrix} onumber]

[A^3 = egin{pmatrix} {0}&{-1} {1}&{0} end{pmatrix} onumber]

[A^4 = egin{pmatrix} {1}&{0} {0}&{1} end{pmatrix} onumber]

[A^5 = egin{pmatrix} {0}&{1} {-1}&{0} end{pmatrix} = A onumber]

and so

[e^{At} = egin{pmatrix} {1-frac{t^2}{2}+frac{t^4}{4}+cdots}&{t-frac{t^3}{3!}+frac{t^5}{5!}-cdots} {-t+frac{t^3}{3!}-frac{t^5}{5!}+cdots}&{1-frac{t^2}{2}+frac{t^4}{4}+cdots} end{pmatrix} = egin{pmatrix} {cos(t)}&{sin(t)} {-sin(t)}&{cos(t)} end{pmatrix} onumber]

Example (PageIndex{3})

If

[A = egin{pmatrix} {0}&{1} {0}&{0} end{pmatrix} onumber]

then

[A^2 = A^3 = A^k = egin{pmatrix} {0}&{1} {0}&{0} end{pmatrix} onumber]

and so

[e^{At} = (I+tA) egin{pmatrix} {1}&{t} {0}&{1} end{pmatrix} onumber]


That’s what I mean, we would have to use the exponential to define it like ##e^-1&31&-5end)>## making the “identity” a definition.

I was overlooking that point, thanks.

I’m still curious about the scalar analog of the question. I know it’s pointless but I’m just wondering if someone has a solution to the following because I haven’t a clue how to approach it:

Suppose we defined ##e^x## by its power series ##e^x=1+x+0.5x^2+. +frac+. ## (where x is a scalar now not a matrix)

Starting from that definition, how would we prove that ##(e^x)^n=e^## for any real n?

I’m just curious there ought to be a way. So if you smart people are bored, then that is my problem to you.


Contents

The term power (Latin: potentia, potestas, dignitas) is a mistranslation [4] [5] of the ancient Greek δύναμις (dúnamis, here: "amplification" [4] ) used by the Greek mathematician Euclid for the square of a line, [6] following Hippocrates of Chios. [7] Archimedes discovered and proved the law of exponents, 10 a ⋅ 10 b = 10 a+b , necessary to manipulate powers of 10 . [8] [ better source needed ] In the 9th century, the Persian mathematician Muhammad ibn Mūsā al-Khwārizmī used the terms مَال (māl, "possessions", "property") for a square—the Muslims, "like most mathematicians of those and earlier times, thought of a squared number as a depiction of an area, especially of land, hence property" [9] —and كَعْبَة (kaʿbah, "cube") for a cube, which later Islamic mathematicians represented in mathematical notation as the letters mīm (m) and kāf (k), respectively, by the 15th century, as seen in the work of Abū al-Hasan ibn Alī al-Qalasādī. [10]

In the late 16th century, Jost Bürgi used Roman numerals for exponents. [11]

Nicolas Chuquet used a form of exponential notation in the 15th century, which was later used by Henricus Grammateus and Michael Stifel in the 16th century. The word exponent was coined in 1544 by Michael Stifel. [12] [13] Samuel Jeake introduced the term indices in 1696. [6] In the 16th century, Robert Recorde used the terms square, cube, zenzizenzic (fourth power), sursolid (fifth), zenzicube (sixth), second sursolid (seventh), and zenzizenzizenzic (eighth). [9] Biquadrate has been used to refer to the fourth power as well.

Early in the 17th century, the first form of our modern exponential notation was introduced by René Descartes in his text titled La Géométrie there, the notation is introduced in Book I. [14]

Some mathematicians (such as Isaac Newton) used exponents only for powers greater than two, preferring to represent squares as repeated multiplication. Thus they would write polynomials, for example, as ax + bxx + cx 3 + d .

Another historical synonym, involution, is now rare [15] and should not be confused with its more common meaning.

"consider exponentials or powers in which the exponent itself is a variable. It is clear that quantities of this kind are not algebraic functions, since in those the exponents must be constant." [16]

With this introduction of transcendental functions, Euler laid the foundation for the modern introduction of natural logarithm—as the inverse function for the natural exponential function, f(x) = e x .

The expression b 2 = bb is called "the square of b" or "b squared", because the area of a square with side-length b is b 2 .

Similarly, the expression b 3 = bbb is called "the cube of b" or "b cubed", because the volume of a cube with side-length b is b 3 .

When it is a positive integer, the exponent indicates how many copies of the base are multiplied together. For example, 3 5 = 3 ⋅ 3 ⋅ 3 ⋅ 3 ⋅ 3 = 243 . The base 3 appears 5 times in the multiplication, because the exponent is 5 . Here, 243 is the 5th power of 3, or 3 raised to the 5th power.

The word "raised" is usually omitted, and sometimes "power" as well, so 3 5 can be simply read "3 to the 5th", or "3 to the 5". Therefore, the exponentiation b n can be expressed as "b to the power of n", "b to the nth power", "b to the nth", or most briefly as "b to the n".

A formula with nested exponentiation, such as 3 5 7 (which means 3 (5 7 ) and not (3 5 ) 7 ), is called a tower of powers, or simply a tower.

The exponentiation operation with integer exponents may be defined directly from elementary arithmetic operations.

Positive exponents Edit

Powers with positive integer exponents may be defined by the base case [17]

The associativity of multiplication implies that for any positive integers m and n ,

Zero exponent Edit

Any nonzero number raised to the 0 power is 1 : [18] [2]

One interpretation of such a power is as an empty product.

The case of 0 0 is more complicated, and the choice of whether to assign it a value and what value to assign may depend on context. For more details, see Zero to the power of zero.

Negative exponents Edit

The following identity holds for any integer n and nonzero b :

Raising 0 to a negative exponent is undefined, but in some circumstances, it may be interpreted as infinity ( ∞ ).

The identity above may be derived through a definition aimed at extending the range of exponents to negative integers.

For non-zero b and positive n , the recurrence relation above can be rewritten as

By defining this relation as valid for all integer n and nonzero b , it follows that

and more generally for any nonzero b and any nonnegative integer n ,

This is then readily shown to be true for every integer n .

Identities and properties Edit

The following identities hold for all integer exponents, provided that the base is non-zero: [2]

Unlike addition and multiplication:

  • Exponentiation is not commutative. For example, 2 3 = 8 ≠ 3 2 = 9 .
  • Exponentiation is not associative. For example, (2 3 ) 4 = 8 4 = 4096 , whereas 2 (3 4 ) = 2 81 = 2 417 851 639 229 258 349 412 352 . Without parentheses, the conventional order of operations for serial exponentiation in superscript notation is top-down (or right-associative), not bottom-up [19][20][21][22] (or left-associative). That is, b p q = b ( p q ) , >=b^ ight)>,>

which, in general, is different from

Powers of a sum Edit

The powers of a sum can normally be computed from the powers of the summands by the binomial formula

However, this formula is true only if the summands commute (i.e. that ab = ba ), which is implied if they belong to a structure that is commutative. Otherwise, if a and b are, say, square matrices of the same size, this formula cannot be used. It follows that in computer algebra, many algorithms involving integer exponents must be changed when the exponentiation bases do not commute. Some general purpose computer algebra systems use a different notation (sometimes ^^ instead of ^ ) for exponentiation with non-commuting bases, which is then called non-commutative exponentiation.

Combinatorial interpretation Edit

For nonnegative integers n and m , the value of n m is the number of functions from a set of m elements to a set of n elements (see cardinal exponentiation). Such functions can be represented as m -tuples from an n -element set (or as m -letter words from an n -letter alphabet). Some examples for particular values of m and n are given in the following table:

n m The n m possible m -tuples of elements from the set <1, . n>
0 5 = 0 none
1 4 = 1 (1,1,1,1)
2 3 = 8 (1,1,1),(1,1,2),(1,2,1),(1,2,2),(2,1,1),(2,1,2),(2,2,1),(2,2,2)
3 2 = 9 (1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)
4 1 = 4 (1),(2),(3),(4)
5 0 = 1 ()

Particular bases Edit

Powers of ten Edit

In the base ten (decimal) number system, integer powers of 10 are written as the digit 1 followed or preceded by a number of zeroes determined by the sign and magnitude of the exponent. For example, 10 3 = 1000 and 10 −4 = 0.0001 .

Exponentiation with base 10 is used in scientific notation to denote large or small numbers. For instance, 299 792 458 m/s (the speed of light in vacuum, in metres per second) can be written as 2.997 924 58 × 10 8 m/s and then approximated as 2.998 × 10 8 m/s .

SI prefixes based on powers of 10 are also used to describe small or large quantities. For example, the prefix kilo means 10 3 = 1000 , so a kilometre is 1000 m .

Powers of three Edit

Powers of two Edit

The first negative powers of 2 are commonly used, and have special names, e.g.: half and quarter.

Powers of 2 appear in set theory, since a set with n members has a power set, the set of all of its subsets, which has 2 n members.

Integer powers of 2 are important in computer science. The positive integer powers 2 n give the number of possible values for an n -bit integer binary number for example, a byte may take 2 8 = 256 different values. The binary number system expresses any number as a sum of powers of 2 , and denotes it as a sequence of 0 and 1 , separated by a binary point, where 1 indicates a power of 2 that appears in the sum the exponent is determined by the place of this 1 : the nonnegative exponents are the rank of the 1 on the left of the point (starting from 0 ), and the negative exponents are determined by the rank on the right of the point.

Powers of one Edit

The powers of one are all one: 1 n = 1 .

Powers of zero Edit

If the exponent n is positive ( n > 0 ), the n th power of zero is zero: 0 n = 0 .

If the exponent n is negative ( n < 0 ), the n th power of zero 0 n is undefined, because it must equal 1 / 0 − n > with -n > 0 , and this would be 1 / 0 according to above.

The expression 0 0 is either defined as 1, or it is left undefined (see Zero to the power of zero).

Powers of negative one Edit

If n is an even integer, then (−1) n = 1 .

If n is an odd integer, then (−1) n = −1 .

Because of this, powers of −1 are useful for expressing alternating sequences. For a similar discussion of powers of the complex number i , see § Powers of complex numbers.

Large exponents Edit

The limit of a sequence of powers of a number greater than one diverges in other words, the sequence grows without bound:

b n → ∞ as n → ∞ when b > 1

This can be read as "b to the power of n tends to +∞ as n tends to infinity when b is greater than one".

Powers of a number with absolute value less than one tend to zero:

b n → 0 as n → ∞ when | b | < 1

Any power of one is always one:

b n = 1 for all n if b = 1

Powers of –1 alternate between 1 and –1 as n alternates between even and odd, and thus do not tend to any limit as n grows.

If b < –1 , b n , alternates between larger and larger positive and negative numbers as n alternates between even and odd, and thus does not tend to any limit as n grows.

If the exponentiated number varies while tending to 1 as the exponent tends to infinity, then the limit is not necessarily one of those above. A particularly important case is

(1 + 1/n) ne as n → ∞

Other limits, in particular those of expressions that take on an indeterminate form, are described in § Limits of powers below.

Power functions Edit

List of whole-number powers Edit

n n 2 n 3 n 4 n 5 n 6 n 7 n 8 n 9 n 10
2 4 8 16 32 64 128 256 512 1024
3 9 27 81 243 729 2187 6561 19 683 59 049
4 16 64 256 1024 4096 16 384 65 536 262 144 1 048 576
5 25 125 625 3125 15 625 78 125 390 625 1 953 125 9 765 625
6 36 216 1296 7776 46 656 279 936 1 679 616 10 077 696 60 466 176
7 49 343 2401 16 807 117 649 823 543 5 764 801 40 353 607 282 475 249
8 64 512 4096 32 768 262 144 2 097 152 16 777 216 134 217 728 1 073 741 824
9 81 729 6561 59 049 531 441 4 782 969 43 046 721 387 420 489 3 486 784 401
10 100 1000 10 000 100 000 1 000 000 10 000 000 100 000 000 1 000 000 000 10 000 000 000

An nth root of a number b is a number x such that x n = b .

If b is a positive real number and n is a positive integer, then there is exactly one positive real solution to x n = b . This solution is called the principal nth root of b. It is denoted nb , where √ is the radical symbol alternatively, the principal nth root of b may be written b 1/n . For example: 9 1/2 = √ 9 = 3 and 8 1/3 = 3 √ 8 = 2 .

If b is equal to 0, the equation x n = b has one solution, which is x = 0 .

If n is even and b is positive, then x n = b has two real solutions, which are the positive and negative nth roots of b, that is, b 1/n > 0 and −(b 1/n ) < 0.

If n is even and b is negative, the equation has no solution in real numbers.

If n is odd, then x n = b has exactly one real solution, which is positive if b is positive ( b 1/n > 0 ) and negative if b is negative ( b 1/n < 0 ).

Taking a positive real number b to a rational exponent u/v, where u is an integer and v is a positive integer, and considering principal roots only, yields

Taking a negative real number b to a rational power u/v, where u/v is in lowest terms, yields a positive real result if u is even, and hence v is odd, because then b u is positive and yields a negative real result, if u and v are both odd, because then b u is negative. The case of even v (and, hence, odd u) cannot be treated this way within the reals, since there is no real number x such that x 2k = −1 , the value of b u/v in this case must use the imaginary unit i, as described more fully in the section § Powers of complex numbers.

Thus we have (−27) 1/3 = −3 and (−27) 2/3 = 9 . The number 4 has two 3/2 powers, namely 8 and −8 however, by convention the notation 4 3/2 employs the principal root, and results in 8. For employing the v-th root the u/v-th power is also called the v/u-th root, and for even v the term principal root denotes also the positive result.

This sign ambiguity needs to be taken care of when applying the power identities. For instance:

is clearly wrong. The problem starts already in the first equality by introducing a standard notation for an inherently ambiguous situation –asking for an even root– and simply relying wrongly on only one, the conventional or principal interpretation. The same problem occurs also with an inappropriately introduced surd-notation, inherently enforcing a positive result:

In general the same sort of problems occur for complex numbers as described in the section § Failure of power and logarithm identities.

Exponentiation to real powers of positive real numbers can be defined either by extending the rational powers to reals by continuity, or more usually as given in § Powers via logarithms below. The result is always a positive real number, and the identities and properties shown above for integer exponents are true for positive real bases with non-integer exponents as well.

On the other hand, exponentiation to a real power of a negative real number is much more difficult to define consistently, as it may be non-real and have several values (see § Real exponents with negative bases). One may choose one of these values, called the principal value, but there is no choice of the principal value for which an identity such as

is true see § Failure of power and logarithm identities. Therefore, exponentiation with a basis that is not a positive real number is generally viewed as a multivalued function.

Limits of rational exponents Edit

Since any irrational number can be expressed as the limit of a sequence of rational numbers, exponentiation of a positive real number b with an arbitrary real exponent x can be defined by continuity with the rule [24]

where the limit as r gets close to x is taken only over rational values of r. This limit only exists for positive b. The (ε, δ)-definition of limit is used this involves showing that for any desired accuracy of the result b x one can choose a sufficiently small interval around x so all the rational powers in the interval are within the desired accuracy.

For example, if x = π , the nonterminating decimal representation π = 3.14159… can be used (based on strict monotonicity of the rational power) to obtain the intervals bounded by rational powers

The exponential function Edit

The important mathematical constant e , sometimes called Euler's number, is approximately equal to 2.718 and is the base of the natural logarithm. Although exponentiation of e could, in principle, be treated the same as exponentiation of any other real number, such exponentials turn out to have particularly elegant and useful properties. Among other things, these properties allow exponentials of e to be generalized in a natural way to other types of exponents, such as complex numbers or even matrices, while coinciding with the familiar meaning of exponentiation with rational exponents.

As a consequence, the notation e x usually denotes a generalized exponentiation definition called the exponential function, exp(x), which can be defined in many equivalent ways, for example, by

Among other properties, exp satisfies the exponential identity

exp ⁡ ( x + y ) = exp ⁡ ( x ) ⋅ exp ⁡ ( y ) .

The exponential function is defined for all integer, fractional, real, and complex values of x. In fact, the matrix exponential is well-defined for square matrices (in which case this exponential identity only holds when x and y commute) and is useful for solving systems of linear differential equations.

Since exp(1) is equal to e, and exp(x) satisfies this exponential identity, it immediately follows that exp(x) coincides with the repeated-multiplication definition of e x for integer x, and it also follows that rational powers denote (positive) roots as usual, so exp(x) coincides with the e x definitions in the previous section for all real x by continuity.

Powers via logarithms Edit

When e x is defined as the exponential function, b x can be defined, for other positive real numbers b , in terms of e x . Specifically, the natural logarithm ln(x) is the inverse of the exponential function e x . It is defined for b > 0 , and satisfies

If b x is to preserve the logarithm and exponent rules, then one must have

This can be used as an alternative definition of the real-number power b x and agrees with the definition given above using rational exponents and continuity. The definition of exponentiation using logarithms is more common in the context of complex numbers, as discussed below.

Real exponents with negative bases Edit

Powers of a positive real number are always positive real numbers. The solution of x 2 = 4, however, can be either 2 or −2. The principal value of 4 1/2 is 2, but −2 is also a valid square root. If the definition of exponentiation of real numbers is extended to allow negative results then the result is no longer well-behaved.

Neither the logarithm method nor the rational exponent method can be used to define b r as a real number for a negative real number b and an arbitrary real number r. Indeed, e r is positive for every real number r, so ln(b) is not defined as a real number for b ≤ 0 .

The rational exponent method cannot be used for negative values of b because it relies on continuity. The function f(r) = b r has a unique continuous extension [24] from the rational numbers to the real numbers for each b > 0 . But when b < 0 , the function f is not even continuous on the set of rational numbers r for which it is defined.

For example, consider b = −1 . The nth root of −1 is −1 for every odd natural number n. So if n is an odd positive integer, (−1) (m/n) = −1 if m is odd, and (−1) (m/n) = 1 if m is even. Thus the set of rational numbers q for which (−1) q = 1 is dense in the rational numbers, as is the set of q for which (−1) q = −1 . This means that the function (−1) q is not continuous at any rational number q where it is defined.

On the other hand, arbitrary complex powers of negative numbers b can be defined by choosing a complex logarithm of b.

Irrational exponents Edit

If b is a positive real algebraic number, and x is a rational number, it has been shown above that b x is an algebraic number. This remains true even if one accepts any algebraic number for b, with the only difference that b x may take several values (a finite number, see below), which are all algebraic. The Gelfond–Schneider theorem provides some information on the nature of b x when x is irrational (that is, not rational). It states:

If b is an algebraic number different from 0 and 1, and x an irrational algebraic number, then all values of b x (there are infinitely many) are transcendental (that is, not algebraic).

If b is a positive real number, and z is any complex number, the power b z is defined by

where x = ln(b) is the unique real solution to the equation e x = b , and the complex power of e is defined by the exponential function, which is the unique function of a complex variable that is equal to its derivative and takes the value 1 for x = 0 .

As, in general, b z is not a real number, an expression such as (b z ) w is not defined by the previous definition. It must be interpreted via the rules for powers of complex numbers, and, unless z is real or w is integer, does not generally equal b zw , as one might expect.

There are various definitions of the exponential function but they extend compatibly to complex numbers and satisfy the exponential property. For any complex numbers z and w, the exponential function satisfies e z + w = e z e w =e^e^> . In particular, for any complex number z = x + i y

This formula links problems in trigonometry and algebra.

Therefore, for any complex number z = x + i y ,

Series definition Edit

The exponential function being equal to its derivative and satisfying e 0 = 1 , =1,> its Taylor series must be

This infinite series, which is often taken as the definition of the exponential function e z for arbitrary complex exponents, is absolutely convergent for all complex numbers z.

When z is purely imaginary, that is, z = iy for a real number y, the series above becomes

which (because it converges absolutely) may be reordered to

The real and the imaginary parts of this expression are Taylor expansions of cosine and sine respectively, centered at zero, implying Euler's formula:

Limit definition Edit

Periodicity Edit

That is, the complex exponential function e z = exp ⁡ ( z ) = exp ⁡ ( z + 2 k π i ) =exp(z)=exp(z+2kpi i)> for any integer k is a periodic function with period 2 π i .

Examples Edit

Integer powers of nonzero complex numbers are defined by repeated multiplication or division as above. If i is the imaginary unit and n is an integer, then i n equals 1, i, −1, or −i, according to whether the integer n is congruent to 0, 1, 2, or 3 modulo 4. Because of this, the powers of i are useful for expressing sequences of period 4.

Complex powers of positive reals are defined via e x as in section Complex exponents with positive real bases above. These are continuous functions.

Trying to extend these functions to the general case of noninteger powers of complex numbers that are not positive reals leads to difficulties. Either we define discontinuous functions or multivalued functions. Neither of these options is entirely satisfactory.

The rational power of a complex number must be the solution to an algebraic equation. Therefore, it always has a finite number of possible values. For example, w = z 1/2 must be a solution to the equation w 2 = z . But if w is a solution, then so is −w, because (−1) 2 = 1 . A unique but somewhat arbitrary solution called the principal value can be chosen using a general rule which also applies for nonrational powers.

Complex powers and logarithms are more naturally handled as single valued functions on a Riemann surface. Single valued versions are defined by choosing a sheet. The value has a discontinuity along a branch cut. Choosing one out of many solutions as the principal value leaves us with functions that are not continuous, and the usual rules for manipulating powers can lead us astray.

Any nonrational power of a complex number has an infinite number of possible values because of the multi-valued nature of the complex logarithm. The principal value is a single value chosen from these by a rule which, amongst its other properties, ensures powers of complex numbers with a positive real part and zero imaginary part give the same value as does the rule defined above for the corresponding real base.

Exponentiating a real number to a complex power is formally a different operation from that for the corresponding complex number. However, in the common case of a positive real number the principal value is the same.

The powers of negative real numbers are not always defined and are discontinuous even where defined. In fact, they are only defined when the exponent is a rational number with the denominator being an odd integer. When dealing with complex numbers the complex number operation is normally used instead.

Complex exponents with complex bases Edit

For complex numbers w and z with w ≠ 0 , the notation w z is ambiguous in the same sense that log w is.

To obtain a value of w z , first choose a logarithm of w call it log w . Such a choice may be the principal value Log w (the default, if no other specification is given), or perhaps a value given by some other branch of log w fixed in advance. Then, using the complex exponential function one defines

because this agrees with the earlier definition in the case where w is a positive real number and the (real) principal value of log w is used.

If z is an integer, then the value of w z is independent of the choice of log w , and it agrees with the earlier definition of exponentiation with an integer exponent.

If z is a rational number m/n in lowest terms with n > 0 , then the countably infinitely many choices of log w yield only n different values for w z these values are the n complex solutions s to the equation s n = w m .

If z is an irrational number, then the countably infinitely many choices of log w lead to infinitely many distinct values for w z .

The computation of complex powers is facilitated by converting the base w to polar form, as described in detail below.

A similar construction is employed in quaternions.

Complex roots of unity Edit

A complex number w such that w n = 1 for a positive integer n is an nth root of unity. Geometrically, the nth roots of unity lie on the unit circle of the complex plane at the vertices of a regular n-gon with one vertex on the real number 1.

If w n = 1 but w k ≠ 1 for all natural numbers k such that 0 < k < n , then w is called a primitive nth root of unity. The negative unit −1 is the only primitive square root of unity. The imaginary unit i is one of the two primitive 4th roots of unity the other one is −i.

The nth roots of unity can then naturally be expressed as

Roots of arbitrary complex numbers Edit

Although there are infinitely many possible values for a general complex logarithm, there are only a finite number of values for the power w q in the important special case where q = 1/n and n is a positive integer. These are the n th roots of w they are solutions of the equation z n = w . As with real roots, a second root is also called a square root and a third root is also called a cube root.

It is usual in mathematics to define w 1/n as the principal value of the root, which is, conventionally, the n th root whose argument has the smallest absolute value. When w is a positive real number, this is coherent with the usual convention of defining w 1/n as the unique positive real n th root. On the other hand, when w is a negative real number, and n is an odd integer, the unique real n th root is not one of the two n th roots whose argument has the smallest absolute value. In this case, the meaning of w 1/n may depend on the context, and some care may be needed for avoiding errors.

The set of n th roots of a complex number w is obtained by multiplying the principal value w 1/n by each of the n th roots of unity. For example, the fourth roots of 16 are 2, −2, 2 i , and −2 i , because the principal value of the fourth root of 16 is 2 and the fourth roots of unity are 1, −1, i , and − i .

Computing complex powers Edit

It is often easier to compute complex powers by writing the number to be exponentiated in polar form. Every complex number z can be written in the polar form

where r is a nonnegative real number and θ is the (real) argument of z. The polar form has a simple geometric interpretation: if a complex number u + iv is thought of as representing a point (u, v) in the complex plane using Cartesian coordinates, then (r, θ) is the same point in polar coordinates. That is, r is the "radius" r 2 = u 2 + v 2 and θ is the "angle" θ = atan2(v, u) . The polar angle θ is ambiguous since any integer multiple of 2π could be added to θ without changing the location of the point. Each choice of θ gives in general a different possible value of the power. A branch cut can be used to choose a specific value. The principal value (the most common branch cut), corresponds to θ chosen in the interval (−π, π] . For complex numbers with a positive real part and zero imaginary part using the principal value gives the same result as using the corresponding real number.

In order to compute the complex power w z , write w in polar form:

If z is decomposed as c + di , then the formula for w z can be written more explicitly as

This final formula allows complex powers to be computed easily from decompositions of the base into polar form and the exponent into Cartesian form. It is shown here both in polar form and in Cartesian form (via Euler's identity).

The following examples use the principal value, the branch cut which causes θ to be in the interval (−π, π] . To compute i i , write i in polar and Cartesian forms:

Similarly, to find (−2) 3 + 4i , compute the polar form of −2:

and use the formula above to compute

The value of a complex power depends on the branch used. For example, if the polar form i = 1e 5πi/2 is used to compute i i , the power is found to be e −5π/2 the principal value of i i , computed above, is e −π/2 . The set of all possible values for i i is given by [28]

So there is an infinity of values that are possible candidates for the value of i i , one for each integer k. All of them have a zero imaginary part, so one can say i i has an infinity of valid real values.

Failure of power and logarithm identities Edit

Some identities for powers and logarithms for positive real numbers will fail for complex numbers, no matter how complex powers and complex logarithms are defined as single-valued functions. For example:

  • The identity log(bx ) = x ⋅ log b holds whenever b is a positive real number and x is a real number. But for the principal branch of the complex logarithm one has i π = log ⁡ ( − 1 ) = log ⁡ [ ( − i ) 2 ] ≠ 2 log ⁡ ( − i ) = 2 ( − i π 2 ) = − i π ight] eq 2log(-i)=2left(-<2>> ight)=-ipi >

Regardless of which branch of the logarithm is used, a similar failure of the identity will exist. The best that can be said (if only using this result) is that:

This identity does not hold even when considering log as a multivalued function. The possible values of log(w z ) contain those of z ⋅ log w as a subset. Using Log(w) for the principal value of log(w) and m , n as any integers the possible values of both sides are:

On the other hand, when x is an integer, the identities are valid for all nonzero complex numbers.

Monoids Edit

Exponentiation with integer exponents can be defined in any multiplicative monoid. [30] A monoid is an algebraic structure consisting of a set X together with a rule for composition ("multiplication") satisfying an associative law and a multiplicative identity, denoted by 1. Exponentiation is defined inductively by

Monoids include many structures of importance in mathematics, including groups and rings (under multiplication), with more specific examples of the latter being matrix rings and fields.

Matrices and linear operators Edit

If A is a square matrix, then the product of A with itself n times is called the matrix power. Also A 0 > is defined to be the identity matrix, [32] and if A is invertible, then A − n = ( A − 1 ) n =left(A^<-1> ight)^> .

These examples are for discrete exponents of linear operators, but in many circumstances it is also desirable to define powers of such operators with continuous exponents. This is the starting point of the mathematical theory of semigroups. [34] Just as computing matrix powers with discrete exponents solves discrete dynamical systems, so does computing matrix powers with continuous exponents solve systems with continuous dynamics. Examples include approaches to solving the heat equation, Schrödinger equation, wave equation, and other partial differential equations including a time evolution. The special case of exponentiating the derivative operator to a non-integer power is called the fractional derivative which, together with the fractional integral, is one of the basic operations of the fractional calculus.

Finite fields Edit

A field is an algebraic structure in which multiplication, addition, subtraction, and division are all well-defined and satisfy their familiar properties. The real numbers, for example, form a field, as do the complex numbers and rational numbers. Unlike these familiar examples of fields, which are all infinite sets, some fields have only finitely many elements. The simplest example is the field with two elements F 2 = < 0 , 1 >=<0,1>> with addition defined by 0 + 1 = 1 + 0 = 1 and 0 + 0 = 1 + 1 = 0 , and multiplication 0 ⋅ 0 = 1 ⋅ 0 = 0 ⋅ 1 = 0 and 1 ⋅ 1 = 1 .

Exponentiation in finite fields has applications in public key cryptography. For example, the Diffie–Hellman key exchange uses the fact that exponentiation is computationally inexpensive in finite fields, whereas the discrete logarithm (the inverse of exponentiation) is computationally expensive.

In abstract algebra Edit

Exponentiation for integer exponents can be defined for quite general structures in abstract algebra.

Let X be a set with a power-associative binary operation which is written multiplicatively. Then x n is defined for any element x of X and any nonzero natural number n as the product of n copies of x, which is recursively defined by

One has the following properties

If the operation has a two-sided identity element 1, then x 0 is defined to be equal to 1 for any x: [ citation needed ]

If the operation also has two-sided inverses and is associative, then the magma is a group. The inverse of x can be denoted by x −1 and follows all the usual rules for exponents:

If the multiplication operation is commutative (as, for instance, in abelian groups), then the following holds:

If the binary operation is written additively, as it often is for abelian groups, then "exponentiation is repeated multiplication" can be reinterpreted as "multiplication is repeated addition". Thus, each of the laws of exponentiation above has an analogue among laws of multiplication.

When there are several power-associative binary operations defined on a set, any of which might be iterated, it is common to indicate which operation is being repeated by placing its symbol in the superscript. Thus, xn is x ∗ . ∗ x , while x #n is x # . # x , whatever the operations ∗ and # might be.

Superscript notation is also used, especially in group theory, to indicate conjugation. That is, g h = h −1 gh , where g and h are elements of some group. Although conjugation obeys some of the same laws as exponentiation, it is not an example of repeated multiplication in any sense. A quandle is an algebraic structure in which these laws of conjugation play a central role.

Over sets Edit

If n is a natural number, and A is an arbitrary set, then the expression A n is often used to denote the set of ordered n-tuples of elements of A. This is equivalent to letting A n denote the set of functions from the set <0, 1, 2, . n − 1> to the set A the n-tuple (a0, a1, a2, . an−1) represents the function that sends i to ai.

For an infinite cardinal number κ and a set A, the notation A κ is also used to denote the set of all functions from a set of size κ to A. This is sometimes written κ A to distinguish it from cardinal exponentiation, defined below.

This generalized exponential can also be defined for operations on sets or for sets with extra structure. For example, in linear algebra, it makes sense to index direct sums of vector spaces over arbitrary index sets. That is, we can speak of

where each Vi is a vector space.

Then if Vi = V for each i, the resulting direct sum can be written in exponential notation as VN , or simply V N with the understanding that the direct sum is the default. We can again replace the set N with a cardinal number n to get V n , although without choosing a specific standard set with cardinality n, this is defined only up to isomorphism. Taking V to be the field R of real numbers (thought of as a vector space over itself) and n to be some natural number, we get the vector space that is most commonly studied in linear algebra, the real vector space R n .

If the base of the exponentiation operation is a set, the exponentiation operation is the Cartesian product unless otherwise stated. Since multiple Cartesian products produce an n-tuple, which can be represented by a function on a set of appropriate cardinality, S N becomes simply the set of all functions from N to S in this case:

This fits in with the exponentiation of cardinal numbers, in the sense that | S N | = | S | | N | , where | X | is the cardinality of X. When "2" is defined as <0, 1 >, we have | 2 X | = 2 | X | , where 2 X , usually denoted by P(X), is the power set of X each subset Y of X corresponds uniquely to a function on X taking the value 1 for xY and 0 for xY .

In category theory Edit

In a Cartesian closed category, the exponential operation can be used to raise an arbitrary object to the power of another object. This generalizes the Cartesian product in the category of sets. If 0 is an initial object in a Cartesian closed category, then the exponential object 0 0 is isomorphic to any terminal object 1.

Of cardinal and ordinal numbers Edit

In set theory, there are exponential operations for cardinal and ordinal numbers.

If κ and λ are cardinal numbers, the expression κ λ represents the cardinality of the set of functions from any set of cardinality λ to any set of cardinality κ. [35] If κ and λ are finite, then this agrees with the ordinary arithmetic exponential operation. For example, the set of 3-tuples of elements from a 2-element set has cardinality 8 = 2 3 . In cardinal arithmetic, κ 0 is always 1 (even if κ is an infinite cardinal or zero).

Exponentiation of cardinal numbers is distinct from exponentiation of ordinal numbers, which is defined by a limit process involving transfinite induction.

Just as exponentiation of natural numbers is motivated by repeated multiplication, it is possible to define an operation based on repeated exponentiation this operation is sometimes called hyper-4 or tetration. Iterating tetration leads to another operation, and so on, a concept named hyperoperation. This sequence of operations is expressed by the Ackermann function and Knuth's up-arrow notation. Just as exponentiation grows faster than multiplication, which is faster-growing than addition, tetration is faster-growing than exponentiation. Evaluated at (3, 3) , the functions addition, multiplication, exponentiation, and tetration yield 6, 9, 27, and 7 625 597 484 987 ( = 3 27 = 3 3 3 = 3 3 ) respectively.

Zero to the power of zero gives a number of examples of limits that are of the indeterminate form 0 0 . The limits in these examples exist, but have different values, showing that the two-variable function x y has no limit at the point (0, 0) . One may consider at what points this function does have a limit.

More precisely, consider the function f(x, y) = x y defined on D = <(x, y) ∈ R 2 : x > 0>. Then D can be viewed as a subset of R 2 (that is, the set of all pairs (x, y) with x , y belonging to the extended real number line R = [−∞, +∞] , endowed with the product topology), which will contain the points at which the function f has a limit.

In fact, f has a limit at all accumulation points of D , except for (0, 0) , (+∞, 0) , (1, +∞) and (1, −∞) . [36] Accordingly, this allows one to define the powers x y by continuity whenever 0 ≤ x ≤ +∞ , −∞ ≤ y ≤ +∞ , except for 0 0 , (+∞) 0 , 1 +∞ and 1 −∞ , which remain indeterminate forms.

Under this definition by continuity, we obtain:

  • x +∞ = +∞ and x −∞ = 0 , when 1 < x ≤ +∞ .
  • x +∞ = 0 and x −∞ = +∞ , when 0 ≤ x < 1 .
  • 0 y = 0 and (+∞) y = +∞ , when 0 < y ≤ +∞ .
  • 0 y = +∞ and (+∞) y = 0 , when −∞ ≤ y < 0 .

These powers are obtained by taking limits of x y for positive values of x . This method does not permit a definition of x y when x < 0 , since pairs (x, y) with x < 0 are not accumulation points of D .

On the other hand, when n is an integer, the power x n is already meaningful for all values of x , including negative ones. This may make the definition 0 n = +∞ obtained above for negative n problematic when n is odd, since in this case x n → +∞ as x tends to 0 through positive values, but not negative ones.

Computing b n using iterated multiplication requires n − 1 multiplication operations, but it can be computed more efficiently than that, as illustrated by the following example. To compute 2 100 , note that 100 = 64 + 32 + 4 . Compute the following in order:

2 2 = 4
(2 2 ) 2 = 2 4 = 16
(2 4 ) 2 = 2 8 = 256
(2 8 ) 2 = 2 16 = 65 536
(2 16 ) 2 = 2 32 = 4 294 967 296
(2 32 ) 2 = 2 64 = 18 446 744 073 709 551 616
2 64 2 32 2 4 = 2 100 = 1 267 650 600 228 229 401 496 703 205 376

This series of steps only requires 8 multiplication operations (the last product above takes 2 multiplications) instead of 99.

In general, the number of multiplication operations required to compute b n can be reduced to Θ(log n) by using exponentiation by squaring or (more generally) addition-chain exponentiation. Finding the minimal sequence of multiplications (the minimal-length addition chain for the exponent) for b n is a difficult problem, for which no efficient algorithms are currently known (see Subset sum problem), but many reasonably efficient heuristic algorithms are available. [37]

Placing an integer superscript after the name or symbol of a function, as if the function were being raised to a power, commonly refers to repeated function composition rather than repeated multiplication. [38] [39] [40] Thus, f 3 (x) may mean f(f(f(x))) [41] in particular, f −1 (x) usually denotes the inverse function of f . This notation was introduced by Hans Heinrich Bürmann [ citation needed ] [39] [40] and John Frederick William Herschel. [38] [39] [40] Iterated functions are of interest in the study of fractals and dynamical systems. Babbage was the first to study the problem of finding a functional square root f 1/2 (x) .

To distinguish exponentiation from function composition, the common usage is to write the exponential exponent after the parenthesis enclosing the argument of the function that is, f(x) 3 means (f(x)) 3 , and f(x) –1 means 1/f(x) .

For historical reasons, and because of the ambiguity resulting of not enclosing arguments with parentheses, a superscript after a function name applied specifically to the trigonometric and hyperbolic functions has a deviating meaning: a positive exponent applied to the function's abbreviation means that the result is raised to that power, [42] [43] [44] [45] [46] [47] [48] [20] [40] while an exponent of −1 still denotes the inverse function. [40] That is, sin 2 x is just a shorthand way to write (sin x) 2 = sin(x) 2 without using parentheses, [16] [49] [50] [51] [52] [53] [54] [20] whereas sin −1 x refers to the inverse function of the sine, also called arcsin x . Each trigonometric and hyperbolic function has its own name and abbreviation both for the reciprocal (for example, 1/(sin x) = (sin x) −1 = sin(x) −1 = csc x ), and its inverse (for example cosh −1 x = arcosh x ). A similar convention exists for logarithms, [40] where today log 2 x usually means (log x) 2 , not log log x . [40]

To avoid ambiguity, some mathematicians [ citation needed ] choose to use ∘ to denote the compositional meaning, writing fn (x) for the n -th iterate of the function f(x) , as in, for example, f ∘3 (x) meaning f(f(f(x))) . For the same purpose, f [n] (x) was used by Benjamin Peirce [55] [40] whereas Alfred Pringsheim and Jules Molk suggested n f(x) instead. [56] [40] [nb 1]

Programming languages generally express exponentiation either as an infix operator or as a (prefix) function, as they are linear notations which do not support superscripts:

  • x ↑ y : Algol, Commodore BASIC, TRS-80 Level II/III BASIC. [57][58]
  • x ^ y : AWK, BASIC, J, MATLAB, Wolfram Language (Mathematica), R, Microsoft Excel, Analytica, TeX (and its derivatives), TI-BASIC, bc (for integer exponents), Haskell (for nonnegative integer exponents), Lua and most computer algebra systems. Conflicting uses of the symbol ^ include: XOR (in POSIX Shell arithmetic expansion, AWK, C, C++, C#, D, Go, Java, JavaScript, Perl, PHP, Python, Ruby and Tcl), indirection (Pascal), and string concatenation (OCaml and Standard ML).
  • x ^^ y : Haskell (for fractional base, integer exponents), D.
  • x ** y : Ada, Z shell, KornShell, Bash, COBOL, CoffeeScript, Fortran, FoxPro, Gnuplot, Groovy, JavaScript, OCaml, F#, Perl, PHP, PL/I, Python, Rexx, Ruby, SAS, Seed7, Tcl, ABAP, Mercury, Haskell (for floating-point exponents), Turing, VHDL.
  • pown x y : F# (for integer base, integer exponent).
  • x⋆y : APL.

Many other programming languages lack syntactic support for exponentiation, but provide library functions:

  • pow(x, y) : C, C++.
  • Math.Pow(x, y) : C#.
  • math:pow(X, Y) : Erlang.
  • Math.pow(x, y) : Java.
  • [Math]::Pow(x, y) : PowerShell.
  • (expt x y) : Common Lisp.

For certain exponents there are special ways to compute x y much faster than through generic exponentiation. These cases include small positive and negative integers (prefer x · x over x 2 prefer 1/x over x −1 ) and roots (prefer sqrt(x) over x 0.5 , prefer cbrt(x) over x 1/3 ).


Powers and Exponentials

This topic shows how to compute matrix powers and exponentials using a variety of methods.

Positive Integer Powers

If A is a square matrix and p is a positive integer, then A^p effectively multiplies A by itself p-1 times. For example:

Inverse and Fractional Powers

If A is square and nonsingular, then A^(-p) effectively multiplies inv(A) by itself p-1 times.

MATLAB® calculates inv(A) and A^(-1) with the same algorithm, so the results are exactly the same. Both inv(A) and A^(-1) produce warnings if the matrix is close to being singular.

Fractional powers, such as A^(2/3) , are also permitted. The results using fractional powers depend on the distribution of the eigenvalues of the matrix.

Element-by-Element Powers

The .^ operator calculates element-by-element powers. For example, to square each element in a matrix you can use A.^2 .

Square Roots

The sqrt function is a convenient way to calculate the square root of each element in a matrix. An alternate way to do this is A.^(1/2) .

For other roots, you can use nthroot . For example, calculate A.^(1/3) .

These element-wise roots differ from the matrix square root, which calculates a second matrix B such that A = BB . The function sqrtm(A) computes A^(1/2) by a more accurate algorithm. The m in sqrtm distinguishes this function from sqrt(A) , which, like A.^(1/2) , does its job element-by-element.

Scalar Bases

In addition to raising a matrix to a power, you also can raise a scalar to the power of a matrix.

When you raise a scalar to the power of a matrix, MATLAB uses the eigenvalues and eigenvectors of the matrix to calculate the matrix power. If [V,D] = eig(A) , then 2 A = V 2 D V - 1 .

Matrix Exponentials

The matrix exponential is a special case of raising a scalar to a matrix power. The base for a matrix exponential is Euler's number e = exp(1) .

The expm function is a more convenient way to calculate matrix exponentials.

The matrix exponential can be calculated in a number of ways. See Matrix Exponentials for more information.

Dealing with Small Numbers

The MATLAB functions log1p and expm1 calculate log ( 1 + x ) and e x - 1 accurately for very small values of x . For example, if you try to add a number smaller than machine precision to 1, then the result gets rounded to 1.


Mike's Toolbox

In mathematics, a frequently occurring computation is to find the sum of consecutive powers of a number. For example, we may need to find the sum of powers of a number x:

Sum = x 5 + x 4 + x 3 + x 2 + x + 1

Recall that a power such as x 3 means to multiply 3 x's together (3 is called the exponent):

If you knew the value of x, it would be possible to compute all of the powers and add them together to find the sum. For example, if x had the value 2, Sum would be:

Sum = 2 5 + 2 4 + 2 3 + 2 2 + 2 + 1
= 32 + 16 + 8 + 4 + 2 + 1
= 63

Even though it is possible to compute the sum as just shown, it is both tedious and error prone. Fortunately there is a compact equation that computes the sum without needing to calculate all of the powers. To derive the formula, we just need to notice what happens when we multiply both sides of the original equation by x:

Sum · x = (x 5 + x 4 + x 3 + x 2 + x + 1) · x
= x 6 + x 5 + x 4 + x 3 + x 2 + x

All of the exponents increased by one. Notice that most of the terms on the right side of the equation are the same as in the original Sum above. In fact, they are all there except for the value 1, so let's add one to both sides:

Sum · x + 1 = x 6 + x 5 + x 4 + x 3 + x 2 + x + 1
= x 6 + (x 5 + x 4 + x 3 + x 2 + x + 1)
= x 6 + Sum

We can rearrange this equation so that all terms containing Sum end up on the left side:

Dividing both sides by (x &minus 1) gives us a nice compact formula for the sum of consecutive powers of a number:

Note that the power in the compact formula is just one more than the highest power in the sum that you are trying to determine. If we evaluate the equation with x set equal to 2 we see that it computes the correct answer:

Sum = 2 6 &minus 1 = 64 &minus 1 = 63
2 &minus 1 1

One caveat to this is that the equation does not work when x = 1. This is because we divided both sides of the above equation by (x &minus 1). When x = 1, this term evaluates to zero, and you can't divide by zero. Fortunately it is easy to see what the value of the Sum would be if x was equal to one. Each of the powers in the Sum evaluate to 1, so the Sum is just the number of terms added together, which in this case would be 6, or one more than the highest exponent in the Sum.


Rip’s Applied Mathematics Blog

There was a lot going on in the example of the SN decomposition (2 of 3). First off, we found eigenvalues of a non-diagonable matrix A, and constructed a diagonal matrix D from them. Then we found 2 eigenvectors and 1 generalized eigenvector of A, and used them to construct a transition matrix P. We used that transition matrix to go from our diagonal D back to the original basis, and find S similar to D.

So S is diagonable while A is not. And A and S have the same eigenvalues and the columns of P should be eigenvectors of S. They are. The generalized eigenvector that we found for A is an (ordinary) eigenvector of S, but we had to get a generalized eigenvector of A in order to construct S from D.

I wonder. Can I understand the distinction between eigenvectors and generalized eigenvectors by studying S and A? We’ll see.

I would also remark that A was special in one sense: it was lower triangular. It is not an accident that the eigenvalues of A are its diagonal elements. We could have written the diagonal matrix D by inspection. Instead, I got it together with the eigenvectors.

A question: what does P do to A? We compute

It brings A to Jordan Canonical Form. (Did I make an especially good choice for the generalized eigenvector? I don’t know. It may be that any choice would have led to the JCF or maybe not. My vague recollection from examples years ago is that I lucked out, that in general I have to use some of that considerable freedom I found in v to get JCF.)

If finding generalized eigenvectors is relatively painless for you, you may be happy with the N+S decomposition. (Of course, if you have a “matrix exponential” command available, you’re done.) If not, another possibility is to use the Cayley-Hamilton theorem: any matrix satisfies it’s own characteristic equation. In this example, the characteristic equation is

(The roots of that are the eigenvalues, of course.) The Cayley-Hamilton says that

where the RHS must be the 3ࡩ zero matrix, and the –4 on the LHS must multiply the 3ࡩ identity matrix. And, indeed, A and its powers satisfy that equation.

I used to wonder what the Cayley-Hamilton theorem was good for. One thing it’s good for is turning higher powers of A into lower ones. Use it to express A^3 in terms of I, A, and A^2. then reduce A^4, and keep going. For our example, we could replace the infinite series in A by 3 infinite series of scalars: one series multiplies I, another multiplies A, and the third multiplies A^2. Maybe we could see the patterns for the three scalar series more easily than a pattern for the series in A itself.

There is yet another way to so this we’ll see it when I look at the spectral decomposition theorem out of Halmos.

Ah, there’s one last point I want to make. Schur’s lemma told us that any matrix could be brought to upper triangular form. I didn’t say this, but any nilpotent matrix is similar to a strictly upper triangular matrix (i.e. upper triangular with zero diagonal). It’s also similar to a strictly lower triangular form I think upper or lower should just depend on an ordering of the basis.

So why not have just split our triangular matrix into diagonal plus nilpotent? Because we need S and N to commute. Our example is a perfect illustration: we have a lower triangular matrix

so we can write it as the sum of diagonal

While B is, indeed, nilpotent, it turns out that it’s B^3 which is equal to zero instead of B^2.

Do B and commute? We compute one

No, they do not commute. It is nontrivial that we can find N and S which commute. We needed to go from D back to S, and get N = A – S.


3 Answers 3

This is a slightly modified version of my response on math.stackexchange. One standard approach to computing matrix functions times a vector $f(M)x$ or quadratic forms $x^Tf(M)x$ when $M$ is symmetric is via the Lanczos algorithm. Lanczos computes an orthonormal basis $Q_k = [q_1, ldots, q_k]$ for Krylov subspace $operatorname(x,Ax,ldots, A^x)$ by a Gram-Schmidt like procedure. This results in a factorization $AQ_k =Q_kT_k + eta_k q_e_k^T$ , where $e_k$ is the $k$ -th canonical basis vector and $T_k$ is a $k imes k$ symmetric tridiagonal matrix.

The "Lanczos approximation" to $f(M)x$ is defined as $Q_k f(T_k) Q_k^T x$ , and the approximation to $x^Tf(M)x$ is defined as $x^TQ_k f(T_k) Q_k^T x$ . Note that $f$ can be any scalar function, for instance $f(x) = exp(x)$ or $f(x) = x^i$ . There are software packages to compute $Q_k exp(T_k) Q_k^T x$ . This can easily be turned into the Lanczos approximation to $x^Texp(M)x$ by taking the inner product with $x$ . The Lanczos approximation to the quadratic form can be viewed as a certain Gaussian quadrature approximation to $x^T f(M) x$ and converges extremely quickly in $k$ if $f$ is analytic (see Golub Meurant "Matrices Moments and Quadrature").

This approach has runtime $O(T_k + nk)$ (or $O(T_k + nk^2)$ with full reorthogonalization), where $T_$ is the cost of evaluating $vmapsto Mv$ . This will almost certainly be cheaper than an SVD or any $n^3$ algorithm since $k$ can typically be taken to be fairly small even for a machine precision accurate output.

Note that if $x$ is unit length $Q_k^Tx = e_1$ . This means the Lanczos approximation to $x^Tf(M)x$ is given by $e_1^T f(T_k) e_1$ and that $Q_k$ doesn't need to be stored. There are also some subtleties about whether full reorthogonalization should be used or not. Whether or not it is used, the output will still be accurate for the matrix exponential, but without reorthogonalization $k$ may need to be larger. However, reorthogonalizaton is more computationally expensive, so there is some tradeoff.

Lanczos and a function to compute the approximation of the quadratic form are easy to implement. Note that if you do not want to use reorthogonalization, you could rewrite the code to avoid storing Q.

To use the code you simply do something like:

Even for a matrix of $n=3000$ , its pretty easy to see that this approach is much faster than SVD or eigenvalue decomposition. For instance, we can generate a matrix with the specificed eigenvalues:

For $k=7$ , the Lanczos approximation obtains full machine precision for the matrix exponential in 50ms on my machine:

On the other hand, even computing the (symmetric) eigendecomposition of M takes 2-3 seconds.


Handbook of Statistics

Pavan Turaga , . Anuj Srivastava , in Handbook of Statistics , 2013

4 Common manifolds arising in image analysis

In this section, we list a few commonly occurring manifolds in image and video understanding. We also list the required tools needed to perform statistical analysis such as tangent spaces, exponential maps, inverse exponential maps, etc. under some standard Riemannian metrics.

The hypersphere: The n -dimensional hypersphere, denoted by S n , can be shown to be a submanifold of R n + 1 . The tangent space at a point p , T p ( S n ) , is just the orthogonal complement of p ∈ R n + 1 . Geodesics on a unit sphere S n are great circles ( Boothby, 1975 ). Using the standard Riemannian metric, i.e., for any v 1 , v 2 ∈ T p ( S n ) , we use the Riemannian metric v 1 , v 2 = v 1 T v 2 , the geodesics can be computed. The distance minimizing geodesic between two points p and q is the shorter of the two arcs of a great circle joining them between them. As a parameterized curve, this geodesic is given by

The exponential map on a sphere, exp : T p ( S n ) ↦ S n , is given by exp p ( v ) = cos ( ‖ v ‖ ) p + sin ( ‖ v ‖ ) v ‖ v ‖ .

Special orthogonal group: The set of orthogonal matrices O ( n ) is a subset of the manifold GL ( n ) that satisfy the condition OO T = I . Those orthogonal matrices with determinant + 1 form the special orthogonal group, and denoted by SO ( n ) . One can show that the tangent space T O O ( n ) = < OX | X is an n × n skew-symmetric matrix >. Define the inner product for any Y , Z ∈ T O O ( n ) by Y , Z = trace ( YZ T ) , where trace denotes the sum of diagonal elements.

To define geodesics on SO ( n ) with respect to the Riemannian metric defined above, we need the matrix exponential. For any O ∈ SO ( n ) and any skew-symmetric matrix X , α ( t ) ≡ Oexpm ( tX ) is the unique geodesic in SO ( n ) passing through O with velocity OX at t = 0 ( Boothby, 1975 ). The exponential maps for SO ( n ) are given by exp O ( X ) = Oexpm ( O T X ) , and the inverse exponential maps are given by exp O 1 - 1 ( O 2 ) = O 1 logm ( O 1 T O 2 ) , where expm and logm refer to the matrix exponential and matrix logarithm, respectively.

Stiefel and Grassmann manifolds: The Stiefel and Grassmann manifolds are studied as quotient spaces of SO ( n ) . The Stiefel manifold S n , d is the set of all d -dimensional orthogonal bases in R n , while the Grassmann manifold G n , d is the space of d -dimensional subspaces of R n . Elements of S n , d are denoted by n × d orthogonal matrix, i.e., U ∈ S n , d implies U ∈ R n × d such that U T U = I d . The tangent space at any point U is

where O = [ UV ] such that V is any arbitrary basis of the space perpendicular to U in R n . Similarly, elements of G n , d are denoted by [ U ] = < UQ | Q ∈ SO ( d ) >and the tangent space at any point [ U ] is

and O is a completion of U as earlier. Geodesics in S n , d and G n , d can be realized as geodesics in the larger space SO ( n ) as long as they are perpendicular to the corresponding orbits.

Symmetric positive definite matrices: The space of d × d symmetric positive definite (tensors/covariance matrices) is denoted as Sym + ( d ) . The tangent space at any point X in Sym + ( d ) is given by the set of d × d symmetric matrices, i.e., Sym ( d ) . For a given point X , and any two tangent vectors Y , Z ∈ T X Sym + ( d ) , we use the inner product Y , Z X = trace ( X - 1 / 2 YX - 1 ZX - 1 / 2 ) ( Pennec et al., 2006 ). Under this Riemannian metric, the geodesic passing through a point X in the direction specified by tangent vector W is given by γ ( t ) = X 1 / 2 expm ( tX - 1 / 2 WX - 1 / 2 ) X 1 / 2 . The exponential map of a point y ∈ T X at X is given by

and the inverse exponential map is given by

where the expm and logm refer to the matrix exponential and matrix logarithm, respectively.


This is one of over 2,400 courses on OCW. Explore materials for this course in the pages linked along the left.

MIT OpenCourseWare is a free & open publication of material from thousands of MIT courses, covering the entire MIT curriculum.

No enrollment or registration. Freely browse and use OCW materials at your own pace. There's no signup, and no start or end dates.

Knowledge is your reward. Use OCW to guide your own life-long learning, or to teach others. We don't offer credit or certification for using OCW.

Made for sharing. Download files for later. Send to friends and colleagues. Modify, remix, and reuse (just remember to cite OCW as the source.)


Answers and Replies

##ln(P)## is only well-defined when P is invertible otherwise P does not have a logarithm.

The wikipedia article on matrix logarithms has some discussion on computing the logarithm in the non-diagonalizable case.

MATLAB has a logm function to take matrix logs for numerical calculations.

If you're doing symbolic work, there's one trick I remember from doing exponentials by hand. I don't know the fancy/correct term for it, but often times when evaluating a matrix power series the powers of the matrix will repeat after a certain number of powers. For example, the generator of the 2D special orthogonal group is [0, -1 1, 0]. Square that and you get [-1, 0 0, -1]. Cube it and you get [0, 1 -1, 0]. Fourth power gives you [1, 0 0, 1]. Since that's just the identity you the fifth power is [0, -1 1, 0] and the cycle repeats. The fifth power equals the first, the sixth equals the second, and so on. I suppose you could say that in such cases the sequence of all powers of [0, -1 1, 0] is homomorphic to Z4 under matrix multiplication, or I could be just making a fool outta myself. That way the power series reduces to a sum over four terms. It should apply to any convergent Taylor series if the powers of the generating matrix repeat.

Can you give us a little more info on what logs you want to take?

MATLAB has a logm function to take matrix logs for numerical calculations.

If you're doing symbolic work, there's one trick I remember from doing exponentials by hand. I don't know the fancy/correct term for it, but often times when evaluating a matrix power series the powers of the matrix will repeat after a certain number of powers. For example, the generator of the 2D special orthogonal group is [0, -1 1, 0]. Square that and you get [-1, 0 0, -1]. Cube it and you get [0, 1 -1, 0]. Fourth power gives you [1, 0 0, 1]. Since that's just the identity you the fifth power is [0, -1 1, 0] and the cycle repeats. The fifth power equals the first, the sixth equals the second, and so on. I suppose you could say that in such cases the sequence of all powers of [0, -1 1, 0] is homomorphic to Z4 under matrix multiplication, or I could be just making a fool outta myself. That way the power series reduces to a sum over four terms. It should apply to any convergent Taylor series if the powers of the generating matrix repeat.


Watch the video: The Matrix Exponential (October 2021).