4.11: Orthogonality - Mathematics

Learning Objectives

1. Determine if a given set is orthogonal or orthonormal.
2. Determine if a given matrix is orthogonal.
3. Given a linearly independent set, use the Gram-Schmidt Process to find corresponding orthogonal and orthonormal sets.
4. Find the orthogonal projection of a vector onto a subspace.
5. Find the least squares approximation for a collection of points.

In this section, we examine what it means for vectors (and sets of vectors) to be orthogonal and orthonormal. You may recall the definitions for the span of a set of vectors and a linear independent set of vectors. We include the definitions and examples here for convenience.

Definition (PageIndex{1}): Span of a Set of Vectors and Subspace

The collection of all linear combinations of a set of vectors ({ vec{u}_1, cdots ,vec{u}_k}) in (mathbb{R}^{n}) is known as the span of these vectors and is written as (mathrm{span} {vec{u}_1, cdots , vec{u}_k}).
We call a collection of the form (mathrm{span} {vec{u}_1, cdots , vec{u}_k}) a subspace of (mathbb{R}^{n}).

Consider the following example.

Example (PageIndex{1}): Spanning Vectors

Describe the span of the vectors (vec{u}=left[ egin{array}{rrr} 1 & 1 & 0 end{array} ight]^T) and (vec{v}=left[ egin{array}{rrr} 3 & 2 & 0 end{array} ight]^T in mathbb{R}^{3}).

Solution

You can see that any linear combination of the vectors (vec{u}) and (vec{v}) yields a vector (left[ egin{array}{rrr} x & y & 0 end{array} ight]^T) in the (XY)-plane.

Moreover every vector in the (XY)-plane is in fact such a linear combination of the vectors (vec{u}) and (vec{v}). That’s because [left[ egin{array}{r} x y 0 end{array} ight] = (-2x+3y) left[ egin{array}{r} 1 1 0 end{array} ight] + (x-y)left[ egin{array}{r} 3 2 0 end{array} ight]]

Thus span({vec{u},vec{v}}) is precisely the (XY)-plane.

The span of a set of a vectors in (mathbb{R}^n) is what we call a subspace of (mathbb{R}^n). A subspace (W) is characterized by the feature that any linear combination of vectors of (W) is again a vector contained in (W).

Another important property of sets of vectors is called linear independence.

Definition: Linear Independence

A set of non-zero vectors ({ vec{u}_1, cdots ,vec{u}_k}) in (mathbb{R}^{n}) is said to be linearly independent if no vector in that set is in the span of the other vectors of that set.

Here is an example.

Example (PageIndex{2}): Linearly Independent Vectors

Consider vectors (vec{u}=left[ egin{array}{rrr} 1 & 1 & 0 end{array} ight]^T), (vec{v}=left[ egin{array}{rrr} 3 & 2 & 0 end{array} ight]^T), and (vec{w}=left[ egin{array}{rrr} 4 & 5 & 0 end{array} ight]^T in mathbb{R}^{3}). Verify whether the set ({vec{u}, vec{v}, vec{w}}) is linearly independent.

Solution

We already verified in Example (PageIndex{1}) that (mathrm{span} {vec{u}, vec{v} }) is the (XY)-plane. Since (vec{w}) is clearly also in the (XY)-plane, then the set ({vec{u}, vec{v}, vec{w}}) is not linearly independent.

In terms of spanning, a set of vectors is linearly independent if it does not contain unnecessary vectors. In the previous example you can see that the vector (vec{w}) does not help to span any new vector not already in the span of the other two vectors. However you can verify that the set ({vec{u}, vec{v}}) is linearly independent, since you will not get the (XY)-plane as the span of a single vector.

We can also determine if a set of vectors is linearly independent by examining linear combinations. A set of vectors is linearly independent if and only if whenever a linear combination of these vectors equals zero, it follows that all the coefficients equal zero. It is a good exercise to verify this equivalence, and this latter condition is often used as the (equivalent) definition of linear independence.

If a subspace is spanned by a linearly independent set of vectors, then we say that it is a basis for the subspace.

Definition: Basis

Let (V) be a subspace of (mathbb{R}^{n}). Then (left{ vec{u}_{1},cdots ,vec{u}_{k} ight}) is a basis for (V) if the following two conditions hold.

1. (mathrm{span}left{ vec{u}_{1},cdots ,vec{u}_{k} ight} =V)
2. (left{ vec{u}_{1},cdots ,vec{u}_{k} ight}) is linearly independent

Thus the set of vectors ({vec{u}, vec{v}}) from Example [exa:linearlyindependentvectors] is a basis for (XY)-plane in (mathbb{R}^{3}) since it is both linearly independent and spans the (XY)-plane.

Recall from the properties of the dot product of vectors that two vectors (vec{u}) and (vec{v}) are orthogonal if (vec{u} cdot vec{v} = 0). Suppose a vector is orthogonal to a spanning set of (mathbb{R}^n). What can be said about such a vector? This is the discussion in the following example.

Example (PageIndex{3}): Orthogonal Vector to a Spanning Set

Let ({vec{x}_1, vec{x}_2, ldots, vec{x}_k}inmathbb{R}^n) and suppose (mathbb{R}^n=mathrm{span}{vec{x}_1, vec{x}_2, ldots, vec{x}_k}). Furthermore, suppose that there exists a vector (vec{u}inmathbb{R}^n) for which (vec{u}cdot vec{x}_j=0) for all (j), (1leq jleq k). What type of vector is (vec{u})?

Solution

Write (vec{u}=t_1vec{x}_1 + t_2vec{x}_2 +cdots +t_kvec{x}_k) for some (t_1, t_2, ldots, t_kinmathbb{R}) (this is possible because (vec{x}_1, vec{x}_2, ldots, vec{x}_k) span (mathbb{R}^n)).

Then

[egin{align*} | vec{u} | ^2 & = vec{u}cdotvec{u} & = vec{u}cdot(t_1vec{x}_1 + t_2vec{x}_2 +cdots +t_kvec{x}_k) & = vec{u}cdot (t_1vec{x}_1) + vec{u}cdot (t_2vec{x}_2) + cdots + vec{u}cdot (t_kvec{x}_k) & = & t_1(vec{u}cdot vec{x}_1) + t_2(vec{u}cdot vec{x}_2) + cdots + t_k(vec{u}cdot vec{x}_k) & = & t_1(0) + t_2(0) + cdots + t_k(0) = 0.end{align*}]

Since ( | vec{u} | ^2 =0), ( | vec{u} | =0). We know that ( | vec{u} | =0) if and only if (vec{u}=vec{0}_n). Therefore, (vec{u}=vec{0}_n). In conclusion, the only vector orthogonal to every vector of a spanning set of (mathbb{R}^n) is the zero vector.

We can now discuss what is meant by an orthogonal set of vectors.

Definition (PageIndex{1}): Orthogonal Set of Vectors

Let ({ vec{u}_1, vec{u}_2, cdots, vec{u}_m }) be a set of vectors in (mathbb{R}^n). Then this set is called an orthogonal set if the following conditions hold:

1. (vec{u}_i cdot vec{u}_j = 0) for all (i eq j)
2. (vec{u}_i eq vec{0}) for all (i)

If we have an orthogonal set of vectors and normalize each vector so they have length 1, the resulting set is called an orthonormal set of vectors. They can be described as follows.

Definition (PageIndex:{1}) Orthonormal Set of Vectors

A set of vectors, (left{ vec{w}_{1},cdots ,vec{w}_{m} ight}) is said to be an orthonormal set if [vec{w}_i cdot vec{w}_j = delta _{ij} = left{ egin{array}{c} 1 ext{ if }i=j 0 ext{ if }i eq j end{array} ight.]

Note that all orthonormal sets are orthogonal, but the reverse is not necessarily true since the vectors may not be normalized. In order to normalize the vectors, we simply need divide each one by its length.

Definition (PageIndex{1}): Normalizing an Orthogonal Set

Normalizing an orthogonal set is the process of turning an orthogonal (but not orthonormal) set into an orthonormal set. If ({ vec{u}_1, vec{u}_2, ldots, vec{u}_k}) is an orthogonal subset of (mathbb{R}^n), then [left{ frac{1}{ | vec{u}_1 | }vec{u}_1, frac{1}{ | vec{u}_2 | }vec{u}_2, ldots, frac{1}{ | vec{u}_k | }vec{u}_k ight}] is an orthonormal set.

We illustrate this concept in the following example.

Example (PageIndex{4}): Orthonormal Set

Consider the set of vectors given by [left{ vec{u}_1, vec{u}_2 ight} = left{ left[ egin{array}{c} 1 1 end{array} ight], left[ egin{array}{r} -1 1 end{array} ight] ight}] Show that it is an orthogonal set of vectors but not an orthonormal one. Find the corresponding orthonormal set.

Solution

One easily verifies that (vec{u}_1 cdot vec{u}_2 = 0) and (left{ vec{u}_1, vec{u}_2 ight}) is an orthogonal set of vectors. On the other hand one can compute that ( | vec{u}_1 | = | vec{u}_2 | = sqrt{2} eq 1) and thus it is not an orthonormal set.

Thus to find a corresponding orthonormal set, we simply need to normalize each vector. We will write ({ vec{w}_1, vec{w}_2 }) for the corresponding orthonormal set. Then, [egin{aligned} vec{w}_1 &=& frac{1}{ | vec{u}_1 | } vec{u}_1 &=& frac{1}{sqrt{2}} left[ egin{array}{c} 1 1 end{array} ight] &=& left[ egin{array}{c} frac{1}{sqrt{2}} frac{1}{sqrt{2}} end{array} ight]end{aligned}]

Similarly, [egin{aligned} vec{w}_2 &=& frac{1}{ | vec{u}_2 | } vec{u}_2 &=& frac{1}{sqrt{2}} left[ egin{array}{r} -1 1 end{array} ight] &=& left[ egin{array}{r} -frac{1}{sqrt{2}} frac{1}{sqrt{2}} end{array} ight]end{aligned}]

Therefore the corresponding orthonormal set is [left{ vec{w}_1, vec{w}_2 ight} = left{ left[ egin{array}{c} frac{1}{sqrt{2}} frac{1}{sqrt{2}} end{array} ight], left[ egin{array}{r} -frac{1}{sqrt{2}} frac{1}{sqrt{2}} end{array} ight] ight}]

You can verify that this set is orthogonal.

Consider an orthogonal set of vectors in (mathbb{R}^n), written ({ vec{w}_1, cdots, vec{w}_k }) with (k leq n). The span of these vectors is a subspace (W) of (mathbb{R}^n). If we could show that this orthogonal set is also linearly independent, we would have a basis of (W). We will show this in the next theorem.

Theorem (PageIndex{1}): Orthogonal Basis of a Subspace

Let ({ vec{w}_1, vec{w}_2, cdots, vec{w}_k }) be an orthonormal set of vectors in (mathbb{R}^n). Then this set is linearly independent and forms a basis for the subspace (W = mathrm{span} { vec{w}_1, vec{w}_2, cdots, vec{w}_k }).

Proof

To show it is a linearly independent set, suppose a linear combination of these vectors equals (vec{0}), such as: [a_1 vec{w}_1 + a_2 vec{w}_2 + cdots + a_k vec{w}_k = vec{0}, a_i in mathbb{R}] We need to show that all (a_i = 0). To do so, take the dot product of each side of the above equation with the vector (vec{w}_i) and obtain the following.

[egin{aligned} vec{w}_i cdot (a_1 vec{w}_1 + a_2 vec{w}_2 + cdots + a_k vec{w}_k ) &=& vec{w}_i cdot vec{0} a_1 (vec{w}_i cdot vec{w}_1) + a_2 (vec{w}_i cdot vec{w}_2) + cdots + a_k (vec{w}_i cdot vec{w}_k) &=& 0 end{aligned}]

Now since the set is orthogonal, (vec{w}_i cdot vec{w}_m = 0) for all (m eq i), so we have: [a_1 (0) + cdots + a_i(vec{w}_i cdot vec{w}_i) + cdots + a_k (0) = 0] [a_i | vec{w}_i | ^2 = 0]

Since the set is orthogonal, we know that ( | vec{w}_i | ^2 eq 0). It follows that (a_i =0). Since the (a_i) was chosen arbitrarily, the set ({ vec{w}_1, vec{w}_2, cdots, vec{w}_k }) is linearly independent.

Finally since (W = mbox{span} { vec{w}_1, vec{w}_2, cdots, vec{w}_k }), the set of vectors also spans (W) and therefore forms a basis of (W).

If an orthogonal set is a basis for a subspace, we call this an orthogonal basis. Similarly, if an orthonormal set is a basis, we call this an orthonormal basis.

We conclude this section with a discussion of Fourier expansions. Given any orthogonal basis (B) of (mathbb{R}^n) and an arbitrary vector (vec{x} in mathbb{R}^n), how do we express (vec{x}) as a linear combination of vectors in (B)? The solution is Fourier expansion.

Theorem (PageIndex{1}): Fourier Expansion

Let (V) be a subspace of (mathbb{R}^n) and suppose ({ vec{u}_1, vec{u}_2, ldots, vec{u}_m }) is an orthogonal basis of (V). Then for any (vec{x}in V),

[vec{x} = left(frac{vec{x}cdot vec{u}_1}{ | vec{u}_1 | ^2} ight) vec{u}_1 + left(frac{vec{x}cdot vec{u}_2}{ | vec{u}_2 | ^2} ight) vec{u}_2 + cdots + left(frac{vec{x}cdot vec{u}_m}{ | vec{u}_m | ^2} ight) vec{u}_m]

This expression is called the Fourier expansion of (vec{x}), and [frac{vec{x}cdot vec{u}_j}{ | vec{u}_j | ^2},] (j=1,2,ldots,m) are the Fourier coefficients.

Consider the following example.

Example (PageIndex{5}): Fourier Expansion

Let (vec{u}_1= left[egin{array}{r} 1 -1 2 end{array} ight], vec{u}_2= left[egin{array}{r} 0 2 1 end{array} ight]), and (vec{u}_3 =left[egin{array}{r} 5 1 -2 end{array} ight]), and let (vec{x} =left[egin{array}{r} 1 1 1 end{array} ight]).

Then (B={ vec{u}_1, vec{u}_2, vec{u}_3}) is an orthogonal basis of (mathbb{R}^3).

Compute the Fourier expansion of (vec{x}), thus writing (vec{x}) as a linear combination of the vectors of (B).

Solution

Since (B) is a basis (verify!) there is a unique way to express (vec{x}) as a linear combination of the vectors of (B). Moreover since (B) is an orthogonal basis (verify!), then this can be done by computing the Fourier expansion of (vec{x}).

That is:

[vec{x} = left(frac{vec{x}cdot vec{u}_1}{ | vec{u}_1 | ^2} ight) vec{u}_1 + left(frac{vec{x}cdot vec{u}_2}{ | vec{u}_2 | ^2} ight) vec{u}_2 + left(frac{vec{x}cdot vec{u}_3}{ | vec{u}_3 | ^2} ight) vec{u}_3. onumber]

[frac{vec{x}cdotvec{u}_1}{ | vec{u}_1 | ^2} = frac{2}{6}, ; frac{vec{x}cdotvec{u}_2}{ | vec{u}_2 | ^2} = frac{3}{5}, mbox{ and } frac{vec{x}cdotvec{u}_3}{ | vec{u}_3 | ^2} = frac{4}{30}. onumber]

Therefore, [left[egin{array}{r} 1 1 1 end{array} ight] = frac{1}{3}left[egin{array}{r} 1 -1 2 end{array} ight] +frac{3}{5}left[egin{array}{r} 0 2 1 end{array} ight] +frac{2}{15}left[egin{array}{r} 5 1 -2 end{array} ight]. onumber]

Orthogonal Matrices

Recall that the process to find the inverse of a matrix was often cumbersome. In contrast, it was very easy to take the transpose of a matrix. Luckily for some special matrices, the transpose equals the inverse. When an (n imes n) matrix has all real entries and its transpose equals its inverse, the matrix is called an orthogonal matrix.

The precise definition is as follows.

Definition (PageIndex{8}): Orthogonal Matrices

A real (n imes n) matrix (U) is called an orthogonal matrix if

[UU^{T}=U^{T}U=I.]

Note since (U) is assumed to be a square matrix, it suffices to verify only one of these equalities (UU^{T}=I) or (U^{T}U=I) holds to guarantee that (U^T) is the inverse of (U).

Consider the following example.

Example (PageIndex{6})

Orthogonal Matrix Show the matrix [U=left[ egin{array}{rr} frac{1}{sqrt{2}} & frac{1}{sqrt{2}} frac{1}{sqrt{2}} & -frac{1}{sqrt{2}} end{array} ight]] is orthogonal.

Solution

All we need to do is verify (one of the equations from) the requirements of Definition [def:OrthoMatrix].

[UU^{T}=left[ egin{array}{rr} frac{1}{sqrt{2}} & frac{1}{sqrt{2}} frac{1}{sqrt{2}} & -frac{1}{sqrt{2}} end{array} ight] left[ egin{array}{rr} frac{1}{sqrt{2}} & frac{1}{sqrt{2}} frac{1}{sqrt{2}} & -frac{1}{sqrt{2}} end{array} ight] = left[ egin{array}{cc} 1 & 0 0 & 1 end{array} ight]]

Since (UU^{T} = I), this matrix is orthogonal.

Here is another example.

Example (PageIndex{7})

Orthogonal Matrix Let (U=left[ egin{array}{rrr} 1 & 0 & 0 0 & 0 & -1 0 & -1 & 0 end{array} ight] .) Is (U) orthogonal?

Solution

Again the answer is yes and this can be verified simply by showing that (U^{T}U=I):

[egin{aligned} U^{T}U&=&left[ egin{array}{rrr} 1 & 0 & 0 0 & 0 & -1 0 & -1 & 0 end{array} ight] ^{T}left[ egin{array}{rrr} 1 & 0 & 0 0 & 0 & -1 0 & -1 & 0 end{array} ight] &=&left[ egin{array}{rrr} 1 & 0 & 0 0 & 0 & -1 0 & -1 & 0 end{array} ight] left[ egin{array}{rrr} 1 & 0 & 0 0 & 0 & -1 0 & -1 & 0 end{array} ight] &=&left[ egin{array}{rrr} 1 & 0 & 0 0 & 1 & 0 0 & 0 & 1 end{array} ight]end{aligned}]

When we say that (U) is orthogonal, we are saying that (UU^T=I), meaning that [sum_{j}u_{ij}u_{jk}^{T}=sum_{j}u_{ij}u_{kj}=delta _{ik}] where (delta _{ij}) is the Kronecker symbol defined by [delta _{ij}=left{ egin{array}{c} 1 ext{ if }i=j 0 ext{ if }i eq j end{array} ight.]

In words, the product of the (i^{th}) row of (U) with the (k^{th}) row gives (1) if (i=k) and (0) if (i eq k.) The same is true of the columns because (U^{T}U=I) also. Therefore, [sum_{j}u_{ij}^{T}u_{jk}=sum_{j}u_{ji}u_{jk}=delta _{ik}] which says that the product of one column with another column gives (1) if the two columns are the same and (0) if the two columns are different.

More succinctly, this states that if (vec{u}_{1},cdots ,vec{u}_{n}) are the columns of (U,) an orthogonal matrix, then [vec{u}_{i}cdot vec{u}_{j}=delta _{ij} = left{ egin{array}{c} 1 ext{ if }i=j 0 ext{ if }i eq j end{array} ight.]

We will say that the columns form an orthonormal set of vectors, and similarly for the rows. Thus a matrix is orthogonal if its rows (or columns) form an orthonormal set of vectors. Notice that the convention is to call such a matrix orthogonal rather than orthonormal (although this may make more sense!).

Proposition (PageIndex{1})

The rows of an (n imes n) orthogonal matrix form an orthonormal basis of (mathbb{R}^n). Further, any orthonormal basis of (mathbb{R}^n) can be used to construct an (n imes n) orthogonal matrix.

Proof

Add proof here and it wi

Recall from Theorem [thm:orthbasis] that an orthonormal set is linearly independent and forms a basis for its span. Since the rows of an (n imes n) orthogonal matrix form an orthonormal set, they must be linearly independent. Now we have (n) linearly independent vectors, and it follows that their span equals (mathbb{R}^n). Therefore these vectors form an orthonormal basis for (mathbb{R}^n).

Suppose now that we have an orthonormal basis for (mathbb{R}^n). Since the basis will contain (n) vectors, these can be used to construct an (n imes n) matrix, with each vector becoming a row. Therefore the matrix is composed of orthonormal rows, which by our above discussion, means that the matrix is orthogonal. Note we could also have construct a matrix with each vector becoming a column instead, and this would again be an orthogonal matrix. In fact this is simply the transpose of the previous matrix.

Consider the following proposition.

Proposition (PageIndex{2})

Det Suppose (U) is an orthogonal matrix. Then (det left( U ight) = pm 1.)

Proof

This result follows from the properties of determinants. Recall that for any matrix (A), (det(A)^T = det(A)). Now if (U) is orthogonal, then: [(det left( U ight)) ^{2}=det left( U^{T} ight) det left( U ight) =det left( U^{T}U ight) =det left( I ight) =1]

Therefore ((det (U))^2 = 1) and it follows that (det left( U ight) = pm 1).

Orthogonal matrices are divided into two classes, proper and improper. The proper orthogonal matrices are those whose determinant equals 1 and the improper ones are those whose determinant equals (-1). The reason for the distinction is that the improper orthogonal matrices are sometimes considered to have no physical significance. These matrices cause a change in orientation which would correspond to material passing through itself in a non physical manner. Thus in considering which coordinate systems must be considered in certain applications, you only need to consider those which are related by a proper orthogonal transformation. Geometrically, the linear transformations determined by the proper orthogonal matrices correspond to the composition of rotations.

We conclude this section with two useful properties of orthogonal matrices.

Theorem (PageIndex{3})

Suppose (A) and (B) are orthogonal matrices. Then (AB) and (A^{-1}) both exist and are orthogonal.

Proof

First we examine the product (AB). [(AB)(B^TA^T)=A(BB^T)A^T =AA^T=I] Since (AB) is square, (B^TA^T=(AB)^T) is the inverse of (AB), so (AB) is invertible, and ((AB)^{-1}=(AB)^T) Therefore, (AB) is orthogonal.

Next we show that (A^{-1}=A^T) is also orthogonal. [(A^{-1})^{-1} = A = (A^T)^{T} =(A^{-1})^{T}] Therefore (A^{-1}) is also orthogonal.

Gram-Schmidt Process

The Gram-Schmidt process is an algorithm to transform a set of vectors into an orthonormal set spanning the same subspace, that is generating the same collection of linear combinations (see Definition [def:linearcombination]).

The goal of the Gram-Schmidt process is to take a linearly independent set of vectors and transform it into an orthonormal set with the same span. The first objective is to construct an orthogonal set of vectors with the same span, since from there an orthonormal set can be obtained by simply dividing each vector by its length.

Algorithm (PageIndex{7}): Gram-Schmidt Process

Let ({ vec{u}_1,cdots ,vec{u}_n }) be a set of linearly independent vectors in (mathbb{R}^{n}).

I: Construct a new set of vectors ({ vec{v}_1,cdots ,vec{v}_n }) as follows: [egin{array}{ll} vec{v}_1 & = vec{u}_1 vec{v}_{2} & = vec{u}_{2} - left( dfrac{ vec{u}_2 cdot vec{v}_1}{ | vec{v}_1 | ^2} ight) vec{v}_1 vec{v}_{3} & = vec{u}_{3} - left( dfrac{vec{u}_3 cdot vec{v}_1}{ | vec{v}_1 | ^2} ight) vec{v}_1 - left( dfrac{vec{u}_3 cdot vec{v}_2}{ | vec{v}_2 | ^2} ight) vec{v}_2 vdots vec{v}_{n} & = vec{u}_{n} - left( dfrac{vec{u}_n cdot vec{v}_1}{ | vec{v}_1 | ^2} ight) vec{v}_1 - left( dfrac{vec{u}_n cdot vec{v}_2}{ | vec{v}_2 | ^2} ight) vec{v}_2 - cdots - left( dfrac{vec{u}_{n} cdot vec{v}_{n-1}}{ | vec{v}_{n-1} | ^2} ight) vec{v}_{n-1} end{array}]

II: Now let (vec{w}_i = dfrac{vec{v}_i}{ | vec{v}_i | }) for (i=1, cdots ,n).

Then

1. (left{ vec{v}_1, cdots, vec{v}_n ight}) is an orthogonal set.
2. (left{ vec{w}_1,cdots , vec{w}_n ight}) is an orthonormal set.
3. (mathrm{span}left{ vec{u}_1,cdots ,vec{u}_n ight} = mathrm{span} left{ vec{v}_1, cdots, vec{v}_n ight} = mathrm{span}left{ vec{w}_1,cdots ,vec{w}_n ight}).

Solution

The full proof of this algorithm is beyond this material, however here is an indication of the arguments.

To show that (left{ vec{v}_1,cdots , vec{v}_n ight}) is an orthogonal set, let [a_2 = dfrac{ vec{u}_2 cdot vec{v}_1}{ | vec{v}_1 | ^2}] then: [egin{array}{ll} vec{v}_1 cdot vec{v}_2 & = vec{v}_1 cdot left( vec{u}_2 - a_2 vec{v}_1 ight) & = vec{v}_1 cdot vec{u}_2 - a_2 (vec{v}_1 cdot vec{v}_1 & = vec{v}_1 cdot vec{u}_2 - dfrac{ vec{u}_2 cdot vec{v}_1}{ | vec{v}_1 | ^2} | vec{v}_1 | ^2 & = ( vec{v}_1 cdot vec{u}_2 ) - ( vec{u}_2 cdot vec{v}_1 ) =0 end{array}] Now that you have shown that ({ vec{v}_1, vec{v}_2}) is orthogonal, use the same method as above to show that ({ vec{v}_1, vec{v}_2, vec{v}_3}) is also orthogonal, and so on.

Then in a similar fashion you show that (mathrm{span}left{ vec{u}_1,cdots ,vec{u}_n ight} = mathrm{span}left{ vec{v}_1,cdots ,vec{v}_n ight}).

Finally defining (vec{w}_i = dfrac{vec{v}_i}{ | vec{v}_i | }) for (i=1, cdots ,n) does not affect orthogonality and yields vectors of length 1, hence an orthonormal set. You can also observe that it does not affect the span either and the proof would be complete.

Consider the following example.

Example (PageIndex{8}): Find Orthonormal Set with Same Span

Consider the set of vectors ({vec{u}_1, vec{u}_2}) given as in Example [exa:spanvectors]. That is [vec{u}_1=left[ egin{array}{r} 1 1 0 end{array} ight], vec{u}_2=left[ egin{array}{r} 3 2 0 end{array} ight] in mathbb{R}^{3}]

Use the Gram-Schmidt algorithm to find an orthonormal set of vectors ({vec{w}_1, vec{w}_2}) having the same span.

Solution

We already remarked that the set of vectors in ({vec{u}_1, vec{u}_2}) is linearly independent, so we can proceed with the Gram-Schmidt algorithm: [egin{aligned} vec{v}_1 &=& vec{u}_1 = left[ egin{array}{r} 1 1 0 end{array} ight] && vec{v}_{2} &=& vec{u}_{2} - left( dfrac{vec{u}_2 cdot vec{v}_1}{ | vec{v}_1 | ^2} ight) vec{v}_1 && &=& left[ egin{array}{r} 3 2 0 end{array} ight] - frac{5}{2} left[ egin{array}{r} 1 1 0 end{array} ight] && &=& left[ egin{array}{r} frac{1}{2} - frac{1}{2} 0 end{array} ight] end{aligned}]

Now to normalize simply let [egin{aligned} vec{w}_1 = frac{vec{v}_1}{ | vec{v}_1 | } = left[ egin{array}{r} frac{1}{sqrt{2}} frac{1}{sqrt{2}} 0 end{array} ight] vec{w}_2 = frac{vec{v}_2}{ | vec{v}_2 | } = left[ egin{array}{r} frac{1}{sqrt{2}} - frac{1}{sqrt{2}} 0 end{array} ight]end{aligned}]

You can verify that ({vec{w}_1, vec{w}_2}) is an orthonormal set of vectors having the same span as ({vec{u}_1, vec{u}_2}), namely the (XY)-plane.

In this example, we began with a linearly independent set and found an orthonormal set of vectors which had the same span. It turns out that if we start with a basis of a subspace and apply the Gram-Schmidt algorithm, the result will be an orthogonal basis of the same subspace. We examine this in the following example.

Example (PageIndex{9}): Find a Corresponding Orthogonal Basis

Let [vec{x}_1=left[egin{array}{c} 1 0 1 0 end{array} ight], vec{x}_2=left[egin{array}{c} 1 0 1 1 end{array} ight], mbox{ and } vec{x}_3=left[egin{array}{c} 1 1 0 0 end{array} ight],] and let (U=mathrm{span}{vec{x}_1, vec{x}_2,vec{x}_3}). Use the Gram-Schmidt Process to construct an orthogonal basis (B) of (U).

Solution

First (vec{f}_1=vec{x}_1).

Next, [vec{f}_2=left[egin{array}{c} 1 0 1 1 end{array} ight] -frac{2}{2}left[egin{array}{c} 1 0 1 0 end{array} ight] =left[egin{array}{c} 0 0 0 1 end{array} ight].]

Finally, [vec{f}_3=left[egin{array}{c} 1 1 0 0 end{array} ight] -frac{1}{2}left[egin{array}{c} 1 0 1 0 end{array} ight] -frac{0}{1}left[egin{array}{c} 0 0 0 1 end{array} ight] =left[egin{array}{c} 1/2 1 -1/2 0 end{array} ight].]

Therefore, [left{ left[egin{array}{c} 1 0 1 0 end{array} ight], left[egin{array}{c} 0 0 0 1 end{array} ight], left[egin{array}{c} 1/2 1 -1/2 0 end{array} ight] ight}] is an orthogonal basis of (U). However, it is sometimes more convenient to deal with vectors having integer entries, in which case we take [B=left{ left[egin{array}{c} 1 0 1 0 end{array} ight], left[egin{array}{c} 0 0 0 1 end{array} ight], left[egin{array}{r} 1 2 -1 0 end{array} ight] ight}.]

Orthogonal Projections

An important use of the Gram-Schmidt Process is in orthogonal projections, the focus of this section.

You may recall that a subspace of (mathbb{R}^n) is a set of vectors which contains the zero vector, and is closed under addition and scalar multiplication. Let’s call such a subspace (W). In particular, a plane in (mathbb{R}^n) which contains the origin, (left(0,0, cdots, 0 ight)), is a subspace of (mathbb{R}^n).

Suppose a point (Y) in (mathbb{R}^n) is not contained in (W), then what point (Z) in (W) is closest to (Y)? Using the Gram-Schmidt Process, we can find such a point. Let (vec{y}, vec{z}) represent the position vectors of the points (Y) and (Z) respectively, with (vec{y}-vec{z}) representing the vector connecting the two points (Y) and (Z). It will follow that if (Z) is the point on (W) closest to (Y), then (vec{y} - vec{z}) will be perpendicular to (W) (can you see why?); in other words, (vec{y} - vec{z}) is orthogonal to (W) (and to every vector contained in (W)) as in the following diagram.

The vector (vec{z}) is called the orthogonal projection of (vec{y}) on (W). The definition is given as follows.

Definition (PageIndex{1}): Orthogonal Projection

Let (W) be a subspace of (mathbb{R}^n), and (Y) be any point in (mathbb{R}^n). Then the orthogonal projection of (Y) onto (W) is given by [vec{z} = mathrm{proj}_{W}left( vec{y} ight) = left( frac{vec{y} cdot vec{w}_1}{ | vec{w}_1 | ^2} ight) vec{w}_1 + left( frac{vec{y} cdot vec{w}_2}{ | vec{w}_2 | ^2} ight) vec{w}_2 + cdots + left( frac{vec{y} cdot vec{w}_m}{ | vec{w}_m | ^2} ight) vec{w}_m] where ({vec{w}_1, vec{w}_2, cdots, vec{w}_m }) is any orthogonal basis of (W).

Therefore, in order to find the orthogonal projection, we must first find an orthogonal basis for the subspace. Note that one could use an orthonormal basis, but it is not necessary in this case since as you can see above the normalization of each vector is included in the formula for the projection.

Before we explore this further through an example, we show that the orthogonal projection does indeed yield a point (Z) (the point whose position vector is the vector (vec{z}) above) which is the point of (W) closest to (Y).

Theorem (PageIndex{1}): Approximation Theorem

Let (W) be a subspace of (mathbb{R}^n) and (Y) any point in (mathbb{R}^n). Let (Z) be the point whose position vector is the orthogonal projection of (Y) onto (W).

Then, (Z) is the point in (W) closest to (Y).

Proof

First (Z) is certainly a point in (W) since it is in the span of a basis of (W).

To show that (Z) is the point in (W) closest to (Y), we wish to show that (|vec{y}-vec{z}_1| > |vec{y}-vec{z}|) for all (vec{z}_1 eq vec{z} in W). We begin by writing (vec{y}-vec{z}_1 = (vec{y} - vec{z}) + (vec{z} - vec{z}_1)). Now, the vector (vec{y} - vec{z}) is orthogonal to (W), and (vec{z} - vec{z}_1) is contained in (W). Therefore these vectors are orthogonal to each other. By the Pythagorean Theorem, we have that [ | vec{y} - vec{z}_1 | ^2 = | vec{y} - vec{z} | ^2 + | vec{z} -vec{z}_1 | ^2 > | vec{y} - vec{z} | ^2] This follows because (vec{z} eq vec{z}_1) so ( | vec{z} -vec{z}_1 | ^2 > 0.)

Hence, ( | vec{y} - vec{z}_1 | ^2 > | vec{y} - vec{z} | ^2). Taking the square root of each side, we obtain the desired result.

Consider the following example.

Example (PageIndex{10}): Orthogonal Projection

Let (W) be the plane through the origin given by the equation (x - 2y + z = 0).
Find the point in (W) closest to the point (Y = (1,0,3)).

Solution

We must first find an orthogonal basis for (W). Notice that (W) is characterized by all points ((a,b,c)) where (c = 2b-a). In other words, [W = left[ egin{array}{c} a b 2b - a end{array} ight] = a left[ egin{array}{c} 1 0 -1 end{array} ight] + b left[ egin{array}{c} 0 1 2 end{array} ight], ; a,b in mathbb{R}]

We can thus write (W) as [egin{aligned} W &=& mbox{span} left{ vec{u}_1, vec{u}_2 ight} &=& mbox{span} left{ left[ egin{array}{r} 1 0 -1 end{array} ight], left[ egin{array}{c} 0 1 2 end{array} ight] ight}end{aligned}]

Notice that this span is a basis of (W) as it is linearly independent. We will use the Gram-Schmidt Process to convert this to an orthogonal basis, (left{vec{w}_1, vec{w}_2 ight}). In this case, as we remarked it is only necessary to find an orthogonal basis, and it is not required that it be orthonormal.

[vec{w}_1 = vec{u}_1 = left[ egin{array}{r} 1 0 -1 end{array} ight]] [egin{aligned} vec{w}_2 &=& vec{u}_2 - left( frac{ vec{u}_2 cdot vec{w}_1}{ | vec{w}_1 | ^2} ight) vec{w}_1 &=& left[ egin{array}{c} 0 1 2 end{array} ight] - left( frac{-2}{2} ight) left[ egin{array}{r} 1 0 -1 end{array} ight] &=& left[ egin{array}{c} 0 1 2 end{array} ight] + left[ egin{array}{r} 1 0 -1 end{array} ight] &=& left[ egin{array}{c} 1 1 1 end{array} ight]end{aligned}]

Therefore an orthogonal basis of (W) is [left{ vec{w}_1, vec{w}_2 ight} = left{ left[ egin{array}{r} 1 0 -1 end{array} ight], left[ egin{array}{c} 1 1 1 end{array} ight] ight}]

We can now use this basis to find the orthogonal projection of the point (Y=(1,0,3)) on the subspace (W). We will write the position vector (vec{y}) of (Y) as (vec{y} = left[ egin{array}{c} 1 0 3 end{array} ight]). Using Definition [def:orthproj], we compute the projection as follows: [egin{aligned} vec{z} &=& mathrm{proj}_{W}left( vec{y} ight) &=& left( frac{vec{y} cdot vec{w}_1}{ | vec{w}_1 | ^2} ight) vec{w}_1 + left( frac{vec{y} cdot vec{w}_2}{ | vec{w}_2 | ^2} ight) vec{w}_2 &=& left( frac{-2}{2} ight) left[ egin{array}{r} 1 0 -1 end{array} ight] + left( frac{4}{3} ight) left[ egin{array}{c} 1 1 1 end{array} ight] &=& left[ egin{array}{c} frac{1}{3} frac{4}{3} frac{7}{3} end{array} ight]end{aligned}]

Therefore the point (Z) on (W) closest to the point ((1,0,3)) is (left( frac{1}{3}, frac{4}{3}, frac{7}{3} ight)).

Recall that the vector (vec{y} - vec{z}) is perpendicular (orthogonal) to all the vectors contained in the plane (W). Using a basis for (W), we can in fact find all such vectors which are perpendicular to (W). We call this set of vectors the orthogonal complement of (W) and denote it (W^{perp}).

Definition (PageIndex{1}): Orthogonal Complement

Let (W) be a subspace of (mathbb{R}^n). Then the orthogonal complement of (W), written (W^{perp}), is the set of all vectors (vec{x}) such that (vec{x} cdot vec{z} = 0) for all vectors (vec{z}) in (W). [W^{perp} = { vec{x} in mathbb{R}^n ; mbox{such that} ; vec{x} cdot vec{z} = 0 ; mbox{for all} ; vec{z} in W }]

The orthogonal complement is defined as the set of all vectors which are orthogonal to all vectors in the original subspace. It turns out that it is sufficient that the vectors in the orthogonal complement be orthogonal to a spanning set of the original space.

Proposition (PageIndex{1}): Orthogonal to Spanning Set

Let (W) be a subspace of (mathbb{R}^n) such that (W = mathrm{span} left{ vec{w}_1, vec{w}_2, cdots, vec{w}_m ight}). Then (W^{perp}) is the set of all vectors which are orthogonal to each (vec{w}_i) in the spanning set.

The following proposition demonstrates that the orthogonal complement of a subspace is itself a subspace.

Proposition (PageIndex{2}): The Orthogonal Complement

Let (W) be a subspace of (mathbb{R}^n). Then the orthogonal complement (W^{perp}) is also a subspace of (mathbb{R}^n).

Consider the following proposition.

Proposition (PageIndex{3}): Orthogonal Complement of (mathbb{R}^n)

The complement of (mathbb{R}^n) is the set containing the zero vector: [(mathbb{R}^n)^{perp} = left{ vec{0} ight}] Similarly, [left{ vec{0} ight}^{perp} = (mathbb{R}^n)].

Proof

Here, (vec{0}) is the zero vector of (mathbb{R}^n). Since (vec{x}cdotvec{0}=0) for all (vec{x}inmathbb{R}^n), (mathbb{R}^nsubseteq{ vec{0}}^{perp}). Since ({ vec{0}}^{perp}subseteqmathbb{R}^n), the equality follows, i.e., ({ vec{0}}^{perp}=mathbb{R}^n).

Again, since (vec{x}cdotvec{0}=0) for all (vec{x}inmathbb{R}^n), (vec{0}in (mathbb{R}^n)^{perp}), so ({ vec{0}}subseteq(mathbb{R}^n)^{perp}). Suppose (vec{x}inmathbb{R}^n), (vec{x} eqvec{0}). Since (vec{x}cdotvec{x}=||vec{x}||^2) and (vec{x} eqvec{0}), (vec{x}cdotvec{x} eq 0), so (vec{x} otin(mathbb{R}^n)^{perp}). Therefore ((mathbb{R}^n)^{perp}subseteq {vec{0}}), and thus ((mathbb{R}^n)^{perp}={vec{0}}).

In the next example, we will look at how to find (W^{perp}).

Example (PageIndex{12}): Orthogonal Complement

Let (W) be the plane through the origin given by the equation (x - 2y + z = 0). Find a basis for the orthogonal complement of (W).

Solution

From Example [exa:orthproj] we know that we can write (W) as [W = mbox{span} left{ vec{u}_1, vec{u}_2 ight} = mbox{span} left{ left[ egin{array}{r} 1 0 -1 end{array} ight], left[ egin{array}{c} 0 1 2 end{array} ight] ight}]

In order to find (W^{perp}), we need to find all (vec{x}) which are orthogonal to every vector in this span.

Let (vec{x} = left[ egin{array}{c} x_1 x_2 x_3 end{array} ight]). In order to satisfy (vec{x} cdot vec{u}_1 = 0), the following equation must hold. [x_1 - x_3 = 0]

In order to satisfy (vec{x} cdot vec{u}_2 = 0), the following equation must hold. [x_2 + 2x_3 = 0]

Both of these equations must be satisfied, so we have the following system of equations. [egin{array}{c} x_1 - x_3 = 0 x_2 + 2x_3 = 0 end{array}]

To solve, set up the augmented matrix.

[left[ egin{array}{rrr|r} 1 & 0 & -1 & 0 0 & 1 & 2 & 0 end{array} ight]]

Using Gaussian Elimination, we find that (W^{perp} = mbox{span} left{ left[ egin{array}{r} 1 -2 1 end{array} ight] ight}), and hence (left{ left[ egin{array}{r} 1 -2 1 end{array} ight] ight}) is a basis for (W^{perp}).

The following results summarize the important properties of the orthogonal projection.

Theorem (PageIndex{1}): Orthogonal Projection

Let (W) be a subspace of (mathbb{R}^n), (Y) be any point in (mathbb{R}^n), and let (Z) be the point in (W) closest to (Y). Then,

1. The position vector (vec{z}) of the point (Z) is given by (vec{z} = mathrm{proj}_{W}left( vec{y} ight))
2. (vec{z} in W) and (vec{y} - vec{z} in W^{perp})
3. (| Y - Z | < | Y - Z_1 |) for all (Z_1 eq Z in W)

Consider the following example of this concept.

Example (PageIndex{13}): Find a Vector Closest to a Given Vector

Let [vec{x}_1=left[egin{array}{c} 1 0 1 0 end{array} ight], vec{x}_2=left[egin{array}{c} 1 0 1 1 end{array} ight], vec{x}_3=left[egin{array}{c} 1 1 0 0 end{array} ight], mbox{ and } vec{v}=left[egin{array}{c} 4 3 -2 5 end{array} ight].] We want to find the vector in (W =mathrm{span}{vec{x}_1, vec{x}_2,vec{x}_3}) closest to (vec{y}).

Solution

We will first use the Gram-Schmidt Process to construct the orthogonal basis, (B), of (W): [B=left{ left[egin{array}{c} 1 0 1 0 end{array} ight], left[egin{array}{c} 0 0 0 1 end{array} ight], left[egin{array}{r} 1 2 -1 0 end{array} ight] ight}.]

By Theorem [thm:orthproj], [mathrm{proj}_U(vec{v}) = frac{2}{2} left[egin{array}{c} 1 0 1 0 end{array} ight] + frac{5}{1}left[egin{array}{c} 0 0 0 1 end{array} ight] + frac{12}{6}left[egin{array}{r} 1 2 -1 0 end{array} ight] = left[egin{array}{r} 3 4 -1 5 end{array} ight]] is the vector in (U) closest to (vec{y}).

Consider the next example.

Example (PageIndex{14}): Vector Written as a Sum of Two Vectors

Let (W) be a subspace given by (W = mbox{span} left{ left[ egin{array}{c} 1 0 1 0 end{array} ight], left[ egin{array}{c} 0 1 0 2 end{array} ight] ight}), and (Y = (1,2,3,4)).
Find the point (Z) in (W) closest to (Y), and moreover write (vec{y}) as the sum of a vector in (W) and a vector in (W^{perp}).

Solution

From Theorem [thm:approximation], the point (Z) in (W) closest to (Y) is given by (vec{z} = mathrm{proj}_{W}left( vec{y} ight)).

Notice that since the above vectors already give an orthogonal basis for (W), we have:

[egin{aligned} vec{z} &=& mathrm{proj}_{W}left( vec{y} ight) &=& left( frac{vec{y} cdot vec{w}_1}{ | vec{w}_1 | ^2} ight) vec{w}_1 + left( frac{vec{y} cdot vec{w}_2}{ | vec{w}_2 | ^2} ight) vec{w}_2 &=& left( frac{4}{2} ight) left[ egin{array}{c} 1 0 1 0 end{array} ight] + left( frac{10}{5} ight) left[ egin{array}{c} 0 1 0 2 end{array} ight] &=& left[ egin{array}{c} 2 2 2 4 end{array} ight]end{aligned}]

Therefore the point in (W) closest to (Y) is (Z = (2,2,2,4)).
Now, we need to write (vec{y}) as the sum of a vector in (W) and a vector in (W^{perp}). This can easily be done as follows: [vec{y} = vec{z} + (vec{y} - vec{z})] since (vec{z}) is in (W) and as we have seen (vec{y} - vec{z}) is in (W^{perp}).
The vector (vec{y} - vec{z}) is given by [vec{y} - vec{z} = left[ egin{array}{c} 1 2 3 4 end{array} ight] - left[ egin{array}{c} 2 2 2 4 end{array} ight] = left[ egin{array}{r} -1 0 1 0 end{array} ight]] Therefore, we can write (vec{y}) as [left[ egin{array}{c} 1 2 3 4 end{array} ight] = left[ egin{array}{c} 2 2 2 4 end{array} ight] + left[ egin{array}{r} -1 0 1 0 end{array} ight]]

Example (PageIndex{15}): Point in a Plane Closest to a Given Point

Find the point (Z) in the plane (3x+y-2z=0) that is closest to the point (Y=(1,1,1)).

Solution

The solution will proceed as follows.

1. Find a basis (X) of the subspace (W) of (mathbb{R}^3) defined by the equation (3x+y-2z=0).
2. Orthogonalize the basis (X) to get an orthogonal basis (B) of (W).
3. Find the projection on (W) of the position vector of the point (Y).

We now begin the solution.

1. (3x+y-2z=0) is a system of one equation in three variables. Putting the augmented matrix in : [left[egin{array}{rrr|r} 3 & 1 & -2 & 0 end{array} ight] ightarrow left[egin{array}{rrr|r} 1 & frac{1}{3} & -frac{2}{3} & 0 end{array} ight]] gives general solution (x=frac{1}{3}s+frac{2}{3}t), (y=s), (z=t) for any (s,tinmathbb{R}). Then [W=mathrm{span} left{ left[egin{array}{r} -frac{1}{3} 1 0 end{array} ight], left[egin{array}{r} frac{2}{3} 0 1 end{array} ight] ight}] Let (X=left{ left[egin{array}{r} -1 3 0 end{array} ight], left[egin{array}{r} 2 0 3 end{array} ight] ight}). Then (X) is linearly independent and (mathrm{span}(X)=W), so (X) is a basis of (W).
2. Use the Gram-Schmidt Process to get an orthogonal basis of (W):

[vec{f}_1=left[egin{array}{r} -1 3 0 end{array} ight] mbox{ and } vec{f}_2 = left[egin{array}{r} 2 0 3 end{array} ight] -frac{-2}{10}left[egin{array}{r} -1 3 0 end{array} ight] =frac{1}{5}left[egin{array}{r} 9 3 15 end{array} ight].] Therefore (B=left{ left[egin{array}{r} -1 3 0 end{array} ight], left[egin{array}{r} 3 1 5 end{array} ight] ight}) is an orthogonal basis of (W).

3. To find the point (Z) on (W) closest to (Y=(1,1,1)), compute [egin{aligned} mathrm{proj}_{W}left[egin{array}{r} 1 1 1 end{array} ight] & = & frac{2}{10} left[egin{array}{r} -1 3 0 end{array} ight] + frac{9}{35}left[egin{array}{r} 3 1 5 end{array} ight] & = & frac{1}{7}left[egin{array}{r} 4 6 9 end{array} ight].end{aligned}] Therefore, (Z=left( frac{4}{7}, frac{6}{7}, frac{9}{7} ight)).

Least Squares Approximation

It should not be surprising to hear that many problems do not have a perfect solution, and in these cases the objective is always to try to do the best possible. For example what does one do if there are no solutions to a system of linear equations (Avec{x}=vec{b})? It turns out that what we do is find (vec{x}) such that (Avec{x}) is as close to (vec{b}) as possible. A very important technique that follows from orthogonal projections is that of the least square approximation, and allows us to do exactly that.

We begin with a lemma.

Recall that we can form the image of an (m imes n) matrix (A) by (mathrm{im}left( A ight) = = left{ Avec{x} : vec{x} in mathbb{R}^n ight}). Rephrasing Theorem [thm:orthproj] using the subspace (W=mathrm{im}left( A ight)) gives the equivalence of an orthogonality condition with a minimization condition. The following picture illustrates this orthogonality condition and geometric meaning of this theorem.

Theorem (PageIndex{1}): Existence of Minimizers

Let (vec{y}in mathbb{R}^{m}) and let (A) be an (m imes n) matrix.

Choose (vec{z}in W= mathrm{im}left( A ight)) given by (vec{z} = mathrm{proj}_{W}left( vec{y} ight)), and let (vec{x} in mathbb{R}^{n}) such that (vec{z}=Avec{x}).

Then

1. (vec{y} - Avec{x} in W^{perp})
2. ( | vec{y} - Avec{x} | < | vec{y} - vec{u} | ) for all (vec{u} eq vec{z} in W)

We note a simple but useful observation.

Lemma (PageIndex{1}): Transpose and Dot Product

Let (A) be an (m imes n) matrix. Then [Avec{x} cdot vec{y} = vec{x}cdot A^Tvec{y}]

Proof

This follows from the definitions: [Avec{x} cdot vec{y}=sum_{i,j}a_{ij}x_{j} y_{i} =sum_{i,j}x_{j} a_{ji} y_{i}= vec{x} cdot A^Tvec{y}]

The next corollary gives the technique of least squares.

Corollary (PageIndex{1}): Least Squares and Normal Equation

A specific value of (vec{x}) which solves the problem of Theorem [thm:existenceminimizerhs] is obtained by solving the equation [A^TAvec{x}=A^Tvec{y}] Furthermore, there always exists a solution to this system of equations.

Proof

For (vec{x}) the minimizer of Theorem [thm:existenceminimizerhs], (left( vec{y}-Avec{x} ight) cdot A vec{u} =0) for all (vec{u} in mathbb{R}^{n}) and from Lemma [lem:transposeanddotprod], this is the same as saying [A^Tleft( vec{y}-Avec{x} ight) cdot vec{u}=0] for all (u in mathbb{R}^{n}.) This implies [A^Tvec{y}-A^TAvec{x}=vec{0}.] Therefore, there is a solution to the equation of this corollary, and it solves the minimization problem of Theorem [thm:existenceminimizerhs].

Note that (vec{x}) might not be unique but (Avec{x}), the closest point of (Aleft(mathbb{R}^{n} ight)) to (vec{y}) is unique as was shown in the above argument.

Consider the following example.

Example (PageIndex{16}): Least Squares Solution to a System

Find a least squares solution to the system [left[ egin{array}{rr} 2 & 1 -1 & 3 4 & 5 end{array} ight] left[ egin{array}{c} x y end{array} ight] =left[ egin{array}{c} 2 1 1 end{array} ight]]

Solution

First, consider whether there exists a real solution. To do so, set up the augmnented matrix given by [left[ egin{array}{rr|r} 2 & 1 & 2 -1 & 3 & 1 4 & 5 & 1 end{array} ight]] The of this augmented matrix is [left[ egin{array}{rr|r} 1 & 0 & 0 0 & 1 & 0 0 & 0 & 1 end{array} ight]]

It follows that there is no real solution to this system. Therefore we wish to find the least squares solution. The normal equations are [egin{aligned} A^T A vec{x} &=& A^T vec{y} left[ egin{array}{rrr} 2 & -1 & 4 1 & 3 & 5 end{array} ight] left[ egin{array}{rr} 2 & 1 -1 & 3 4 & 5 end{array} ight] left[ egin{array}{c} x y end{array} ight] &=&left[ egin{array}{rrr} 2 & -1 & 4 1 & 3 & 5 end{array} ight] left[ egin{array}{c} 2 1 1 end{array} ight]end{aligned}] and so we need to solve the system [left[ egin{array}{rr} 21 & 19 19 & 35 end{array} ight] left[ egin{array}{c} x y end{array} ight] =left[ egin{array}{r} 7 10 end{array} ight]] This is a familiar exercise and the solution is [left[ egin{array}{c} x y end{array} ight] =left[ egin{array}{c} frac{5}{34} frac{7}{34} end{array} ight]]

Consider another example.

Example (PageIndex{17}): Least Squares Solution to a System

Find a least squares solution to the system [left[ egin{array}{rr} 2 & 1 -1 & 3 4 & 5 end{array} ight] left[ egin{array}{c} x y end{array} ight] =left[ egin{array}{c} 3 2 9 end{array} ight]]

Solution

First, consider whether there exists a real solution. To do so, set up the augmnented matrix given by [left[ egin{array}{rr|r} 2 & 1 & 3 -1 & 3 & 2 4 & 5 & 9 end{array} ight]] The of this augmented matrix is [left[ egin{array}{rr|r} 1 & 0 & 1 0 & 1 & 1 0 & 0 & 0 end{array} ight]]

It follows that the system has a solution given by (x=y=1). However we can also use the normal equations and find the least squares solution. [left[ egin{array}{rrr} 2 & -1 & 4 1 & 3 & 5 end{array} ight] left[ egin{array}{rr} 2 & 1 -1 & 3 4 & 5 end{array} ight] left[ egin{array}{c} x y end{array} ight] =left[ egin{array}{rrr} 2 & -1 & 4 1 & 3 & 5 end{array} ight] left[ egin{array}{r} 3 2 9 end{array} ight]] Then [left[ egin{array}{rr} 21 & 19 19 & 35 end{array} ight] left[ egin{array}{c} x y end{array} ight] =left[ egin{array}{c} 40 54 end{array} ight]]

The least squares solution is [left[ egin{array}{c} x y end{array} ight] =left[ egin{array}{c} 1 1 end{array} ight]] which is the same as the solution found above.

An important application of Corollary [cor:normalEquation] is the problem of finding the least squares regression line in statistics. Suppose you are given points in the (xy) plane [left{ left( x_{1},y_{1} ight), left( x_{2},y_{2} ight), cdots, left( x_{n},y_{n} ight) ight}] and you would like to find constants (m) and (b) such that the line (vec{y}=mvec{x}+b) goes through all these points. Of course this will be impossible in general. Therefore, we try to find (m,b) such that the line will be as close as possible. The desired system is

[left[ egin{array}{c} y_{1} vdots y_{n} end{array} ight] =left[ egin{array}{cc} x_{1} & 1 vdots & vdots x_{n} & 1 end{array} ight] left[ egin{array}{c} m b end{array} ight]]

which is of the form (vec{y}=Avec{x}). It is desired to choose (m) and (b) to make

[left | Aleft[ egin{array}{c} m b end{array} ight] -left[ egin{array}{c} y_{1} vdots y_{n} end{array} ight] ight | ^{2}]

as small as possible. According to Theorem [thm:existenceminimizerhs] and Corollary [cor:normalEquation], the best values for (m) and (b) occur as the solution to

[A^{T}Aleft[ egin{array}{c} m b end{array} ight] =A^{T}left[ egin{array}{c} y_{1} vdots y_{n} end{array} ight] , ;mbox{where}; A=left[ egin{array}{cc} x_{1} & 1 vdots & vdots x_{n} & 1 end{array} ight]]

Thus, computing (A^{T}A,)

[left[ egin{array}{cc} sum_{i=1}^{n}x_{i}^{2} & sum_{i=1}^{n}x_{i} sum_{i=1}^{n}x_{i} & n end{array} ight] left[ egin{array}{c} m b end{array} ight] =left[ egin{array}{c} sum_{i=1}^{n}x_{i}y_{i} sum_{i=1}^{n}y_{i} end{array} ight]]

Solving this system of equations for (m) and (b) (using Cramer’s rule for example) yields:

[m= frac{-left( sum_{i=1}^{n}x_{i} ight) left( sum_{i=1}^{n}y_{i} ight) +left( sum_{i=1}^{n}x_{i}y_{i} ight) n}{left( sum_{i=1}^{n}x_{i}^{2} ight) n-left( sum_{i=1}^{n}x_{i} ight) ^{2}}] and [b=frac{-left( sum_{i=1}^{n}x_{i} ight) sum_{i=1}^{n}x_{i}y_{i}+left( sum_{i=1}^{n}y_{i} ight) sum_{i=1}^{n}x_{i}^{2}}{left( sum_{i=1}^{n}x_{i}^{2} ight) n-left( sum_{i=1}^{n}x_{i} ight) ^{2}}.]

Consider the following example.

Example (PageIndex{18}): Least Squares Regression

Find the least squares regression line (vec{y}=mvec{x}+b) for the following set of data points: [left{ (0,1), (1,2), (2,2), (3,4), (4,5) ight} onumber]

Solution

In this case we have (n=5) data points and we obtain: [egin{array}{ll} sum_{i=1}^{5}x_{i} = 10 & sum_{i=1}^{5}y_{i} = 14 sum_{i=1}^{5}x_{i}y_{i} = 38 & sum_{i=1}^{5}x_{i}^{2} = 30 end{array}] and hence [egin{aligned} m &=& frac{- 10 * 14 + 5*38}{5*30-10^2} = 1.00 b &=& frac{- 10 * 38 + 14*30}{5*30-10^2} = 0.80 end{aligned}]

The least squares regression line for the set of data points is: [vec{y} = vec{x}+.8]

One could use this line to approximate other values for the data. For example for (x=6) one could use (y(6)=6+.8=6.8) as an approximate value for the data.

The following diagram shows the data points and the corresponding regression line.

One could clearly do a least squares fit for curves of the form (y=ax^{2}+bx+c) in the same way. In this case you want to solve as well as possible for (a,b,) and (c) the system [left[ egin{array}{ccc} x_{1}^{2} & x_{1} & 1 vdots & vdots & vdots x_{n}^{2} & x_{n} & 1 end{array} ight] left[ egin{array}{c} a b c end{array} ight] =left[ egin{array}{c} y_{1} vdots y_{n} end{array} ight]] and one would use the same technique as above. Many other similar problems are important, including many in higher dimensions and they are all solved the same way.

What does it mean when two functions are &ldquoorthogonal&rdquo, why is it important?

I have often come across the concept of orthogonality and orthogonal functions e.g in fourier series the basis functions are cos and sine, and they are orthogonal. For vectors being orthogonal means that they are actually perpendicular such that their dot product is zero. However, I am not sure how sine and cosine are actually orthogonal. They are 90 out of phase, but there must be a different reason why they are considered orthogonal. What is that reason? Does being orthognal really have something to do with geometry i.e 90 degree angels?

Why do we want to have orthogonal things so often in maths? especially with transforms like fourier transform, we want to have orthogonal basis. What does that even mean? Is there something magical about things being orthogonal?

Lesson 11

Let’s analyze graphs of functions to learn about their domain and range.

11.1: Which One Doesn't Belong: Unlabeled Graphs

Expand Image

Expand Image

Description: <p>A graph with origin O. The graph is a step function with 6 horizontal line segments beginning with a closed circle and ending with an open circle. The next horizontal line begins where the previous line ends.</p>

Expand Image

Expand Image

Description: <p>A graph with origin O. The graph increases and decreases quickly and is shaped like a wave. The height of the waves are larger in the middle than on the ends.</p>

11.2: Time on the Swing

A child gets on a swing in a playground, swings for 30 seconds, and then gets off the swing.

Here are descriptions of four functions in the situation and four graphs representing them.

The independent variable in each function is time, measured in seconds.

Expand Image

Attribution: Toddler on Swing, by Max Pixel. Public Domain.. Source.

Match each function with a graph that could represent it. Then, label the axes with the appropriate variables. Be prepared to explain how you make your matches.

• Function (h) : The height of the swing, in feet, as a function of time since the child gets on the swing
• Function (r) : The amount of time left on the swing as a function of time since the child gets on the swing
• Function (d) : The distance, in feet, of the swing from the top beam (from which the swing is suspended) as a function of time since the child gets on the swing
• Function (s) : The total number of times an adult pushes the swing as a function of time since the child gets on the swing

Expand Image

Expand Image

Description: <p>A graph with origin O. The graph is a step function with 6 horizontal line segments beginning with a closed circle and ending with an open circle. The next horizontal line begins where the previous line ends.</p>

Expand Image

Expand Image

Description: <p>A graph with origin O. The graph increases and decreases quickly and is shaped like a wave. The height of the waves are larger in the middle than on the ends.</p>

11.3: Back to the Bouncing Ball

A tennis ball was dropped from a certain height. It bounced several times, rolled along for a short period, and then stopped. Function (H) gives its height over time.

Here is a partial graph of (H) . Height is measured in feet. Time is measured in seconds.

Be prepared to explain what each value or set of values means in this situation.

Expand Image

1. Find (H(0)) .
2. Solve (H(x) = 0) .
3. Describe the domain of the function.
4. Describe the range of the function.

In function (H) , the input was time in seconds and the output was height in feet.

Think about some other quantities that could be inputs or outputs in this situation.

1. Describe a function whose domain includes only integers. Be sure to specify the units.
2. Describe a function whose range includes only integers. Be sure to specify the units.
3. Sketch a graph of each function.

Expand Image

Expand Image

Summary

The graph of a function can sometimes give us information about its domain and range.

Here are graphs of two functions we saw earlier in the unit. The first graph represents the best price of bagels as a function of the number of bagels bought. The second graph represents the height of a bungee jumper as a function of seconds since the jump began.

What are the domain and range of each function?

The number of bagels cannot be negative but could include 0 (no bagels bought). The domain of the function therefore includes 0 and positive whole numbers, or (n geq 0) .

The best price can be (for buying 0 bagels), certain multiples of 1.25, certain multiples of 6, and so on. The range includes 0 and certain positive values.

Expand Image

Description: <p>A graph with origin O. The horizontal axis, number of bagels, scale from 0 to 14 by 1s. The vertical axis, best price in dollars, scale from 0 to 16 by 2s. Points are marked with an X at 0 comma 0, 1 comma 1.25, 2 comma 2.5, 3 comma 3.75, 4 comma 5, 5 comma 5.25, 6 comma 6, 7 comma 7, 8 comma 8, 9 comma 8, 10 comma 11.25, 11 comma 12.5, 12 comma 10, and 13 comma 11.25.</p>

The domain of the height function would include any amount of time since the jump began, up until the jump is complete. From the graph, we can tell that this happened more than 70 seconds after ​​​​​​the jump began, but we don't know the exact value of (t) .

The graph shows a maximum height of 80 meters and a minimum height of 10 meters. We can conclude that the range of this function includes all values that are at least 10 and at most 80.

Expand Image

Description: <p>A graph with origin O. The horizontal axis, labeled t, seconds, scale from 0 to 70 by 10s. The vertical axis, labeled h, meters, scale from 0 to 90 by 10s. The curve crosses the y axis at 0 comma 80. The curve trends down to the point 9 comma 10. The curve trends up to the the point 18 comma 42. The curve trends down to the point 28 comma 20. The curve trends up to the point 38 comma 38. The curve trends down to the point 46 coma 25. The curve trends up to the point 56 comma 35. The curve trends down to the point 65 comma 27. The curve trends up and off the grid.</p>

1. Introduction¶

Why is linear algebra important in machine learning? Machine learning methods often involves a large amount of data, and linear algebra provides a clever way to analyze and manipulate them. To make the argument concrete, let's take a look at a sample dataset.

1.1 Boston house prices dataset¶

We can load boston dataset from sklearn package, which is a very popular and easy to use machine learning package of Python. It implements many kinds of machine learning algorithms and utility functions. The loaded dataset has the following attributes.

The data and target values are stored in arrays of type numpy.ndarray . In the data array, each row corresponds to a sample, a Boston suburb or town in this example, and each column corresponds to a feature that is described above. Note that numpy.ndarray is not just multi-dimensional array (or list in Python). It implements many useful numeric methods and indexing feature. Refer to the ndarray document and indexing document for the details. Here, I show the first 10 samples, each of which consists of 13 feature values, and some of their statistics by slicing the data array.

The target values are the following. Our task here is to predict the target value, or the "median value of owner-occupied homes in $1000's" in a Boston town, given its feature values such as "per capita crime rate by town" and "average number of rooms per dwelling." 1.2 Linear regression¶ Linear regression is one of the simplest statistical models. It assumes that the target variable$y$is explained by a weighted sum of feature values$x_1, x_2, dots, x_n$. In an equation, where$b$is a bias term. Intuitively,$w_*x_*$terms define the relative up/down from the standard target value. This standard value is what the bias term accounts for. You may wonder if the relationship is really that simple. "Essentially, all models are wrong, but some are useful." George Box, 1987 Assuming that the linear regression model is valid and we know all the weights and bias, we can estimate a median house price of a Boston town from its feature values. The bad news is that we don't know the weights. The good news is that we have training samples (a set of features and target pair)! We want to find a set of weights such that the equation holds for the training samples. To this end, we can solve systems of equations. Great, we can solve it, . or can we (more on this later)? Let's rewrite the equations with a better notation. Yes, this is beautiful. This notation is used in linear algebra, and it is a very powerful tool given to us to tackle machine learing problems. The objective here is to find a set of weights$oldsymbol$that solves this equation. We call this process to learn from data. Lesson 11 In this lesson students are introduced to contexts involving markups, discounts, and commissions, and they continue to study contexts involving tax and tips. Questions about rounding may naturally come up in this lesson. This lesson primarily involves dollar amounts, so it is sensible to round to the nearest cent (the nearest hundredth of a dollar). Percentages may be rounded to the nearest whole percent or fraction of a percent, depending on the situation. Learning Goals Let’s learn about more situations that involve percentages. Required Materials Required Preparation Print and cut up slips from the Card Sort: Percentage Situations blackline master. Prepare 1 copy for every 2 students. These may be re-used if you have multiple classes. It is recommended that students be provided access to four-function calculators so that they can focus on reasoning about how numbers are related to each other, representing those relationships, and deciding which operations are appropriate (rather than focusing on computation.) Learning Targets CCSS Standards Print Formatted Materials Teachers with a valid work email address can click here to register or sign in for free access to Cool Down, Teacher Guide, and PowerPoint materials. Additional Resources IM 6–8 Math was originally developed by Open Up Resources and authored by Illustrative Mathematics®, and is copyright 2017-2019 by Open Up Resources. It is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). OUR's 6–8 Math Curriculum is available at https://openupresources.org/math-curriculum/. Adaptations and updates to IM 6–8 Math are copyright 2019 by Illustrative Mathematics, and are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). Adaptations to add additional English language learner supports are copyright 2019 by Open Up Resources, and are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). The second set of English assessments (marked as set "B") are copyright 2019 by Open Up Resources, and are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). Spanish translation of the "B" assessments are copyright 2020 by Illustrative Mathematics, and are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). The Illustrative Mathematics name and logo are not subject to the Creative Commons license and may not be used without the prior and express written consent of Illustrative Mathematics. This site includes public domain images or openly licensed images that are copyrighted by their respective owners. Openly licensed images remain under the terms of their respective licenses. See the image attribution section for more information. 11.2: A Car Dealership (10 minutes) Activity The purpose of this activity is to introduce students to a context involving markups and markdowns or discounts, and to connect this to the work on percent increase and percent decrease they did earlier. The first question helps set the stage for students to see the connection to markups and percent increase. Look for students who solve the second question by finding 90% of the retail price, and highlight this approach in the discussion. Launch Tell students that a mark-up is a percentage that businesses often add to the price of an item they sell, and a mark-down is a percentage they take off of a given price. If helpful, review the meaning of wholesale (the price the dealership pays for the car) and retail price (the price the dealership charges to sell the car). Sometimes people call mark-downs discounts. Provide access to calculators. Students in groups of 2. Give students 5 minutes of quiet work time, followed by partner then whole-class discussion. A car dealership pays a wholesale price of$ 12,000 to purchase a vehicle.

The car dealership wants to make a 32% profit.

1. By how much will they mark up the price of the vehicle?
2. After the markup, what is the retail price of the vehicle?

Expand Image

Attribution: Cars , by Pexels. Public Domain. Pixabay. Source.

Student Response

For access, consult one of our IM Certified Partners.

This car dealership pays the salesperson a bonus for selling the car equal to 6.5% of the sale price. How much commission did the salesperson lose when they decided to offer a 10% discount on the price of the car?

Student Response

For access, consult one of our IM Certified Partners.

Anticipated Misconceptions

It is important throughout that students attend to the meanings of particular words and remain clear on the meaning of the different values they find. For example, "wholesale price," "retail price," and "sale price" all refer to specific dollar amounts. Help students organize their work by labeling the different quantities they find or creating a graphic organizer.

Activity Synthesis

For the first question, help students connect markups to percent increase.

Select students to share solutions to the second question. Highlight finding 90% of the retail price, and reinforce that a 10% discount is a 10% decrease.

Ask them to describe how they would find (but not actually find) . . .

• "The retail price after a 12% markup?" (Multiply the retail price by 0.12, then add that answer to the retail price. Alternatively, multiply the retail price by 1.12.)
• "The price after a 24% discount?" (Multiply the retail price by 0.24, then subtract that answer from the retail price. Alternatively, multiply the retail price by 0.76.)

Time Series Analysis: Methods and Applications

Lara Fontanella , Luigi Ippoliti , in Handbook of Statistics , 2012

6 Discussion

This chapter has illustrated that the KL technique performs well in the extraction of specific features of temporal and spatio-temporal data. We have also shown that KL (and EOF) analysis is a useful tool for dimensionality reduction. We began by reviewing the conventional KL method for one-dimensional processes then, we described the decomposition and reconstruction phases to illustrate the specific steps of the analysis. Of course, a common goal of time series analysis is extrapolating past behavior into the future. Here, we have not considered the forecasting problem but specific details, including how to specify forecast confidence bounds, are given in the study by Golyandina et al. (2001 , Section 2.4).

An important part of data modeling is the specification of the trajectory matrix and its parameter K, which defines the window length and, hence, the number N of the delayed copies of the series. The numerical value of K is determined experimentally, because in practice, its choice is guided by both the length of the signal and the number of components thought to be present in V (t ). The choice of K is also very much like to the choice of the length of the support of the wavelets, for example in a Daubechies wavelet transform.

A multiresolution version of the KL was also discussed. MR-KL allows for a nonlinear approximation, which is better suited for denoising purposes. The link between the MR-KL and other well-known multiresolution decompositions, including wavelets, may be examined by employing the system approach proposed by Unser (1993) . However, MR-KL is characterized by basis functions, which are data adaptive. In contrast with wavelets and Fourier analysis, the KL model does not require an advance specification of the functional form of the eigenfunctions leaving it to be freely determined by the structure of the data. As shown in Section 4 , the possibility of deriving the basis functions from the covariance structure of the data allowed us to extend the theory within the generalized eigendecomposition. For example, this is particularly useful when two signals are available as described, for example, in the study by Merla et al. (2004) and Shastri et al. (2009) .

The chapter also discussed the application of KLE in a spatio-temporal context. The expansion of the process has been defined within a state-space framework that allows to estimate the expansion coefficients through the Kalman recursions. Note that this approach contrasts with that described by Hannachi et al. (2007) , where the spatial patterns and the expansion coefficients are obtained through the singular value decomposition of the spatio-temporal matrix.

Obviously, it is difficult to provide a total overview of a field that is too broad for us to be exhaustive. For example, we have not discussed some other extensions of EOFs including cyclostationary, PXEOFs, the S-mode EOF analysis, trend EOFs, and nonlinear extensions of PCA. Also, we have not used this chapter to fully describe the use of KL (EOF) in atmospheric sciences. For all these points, there are several reference books and review papers and we refer the interested reader to them for specific details.

SIAM Journal on Matrix Analysis and Applications

This paper arose from a fascinating observation, apparently by Charles Sheffield, and relayed to us by Gene Golub, that the QR factorization of an $m imes n$ matrix A via the modified Gram-Schmidt algorithm (MGS) is numerically equivalent to that arising from Householder transformations applied to the matrix A augmented by an n by n zero matrix. This is explained in a clear and simple way, and then combined with a well-known rounding error result to show that the upper triangular matrix R from MGS is about as accurate as R from other QR factorizations. The special structure of the product of the Householder transformations is derived, and then used to explain and bound the loss of orthogonality in MGS. Finally this numerical equivalence is used to show how orthogonality in MGS can be regained in general. This is illustrated by deriving a numerically stable algorithm based on MGS for a class of problems which includes solution of nonsingular linear systems, a minimum 2-norm solution of underdetermined linear systems, and linear least squares problems. A brief discussion on the relative merits of such algorithms is included.

Mathematics Grade 11 Textbook for Ethiopia

This is Mathematics grade 11 textbook for ethiopian students, technology is improving and students require to get Mathematics textbook in app from google play store by searching the term Mathematics grade 11 textbook for ethiopia then they will get the app of Mathematics grade 11 textbook for ethiopia in google play store.

carrying Mathematics grade 11 textbook by hand as a book and just installing the app of Mathematics grade 11 textbook is two different things. The last option is more preferable then the first one and it is the direction of Ethiopian 10 years plan and ESDP VI to generate more digital textbooks like this one we have developed.

Thanks for reading this full description and we recommend to install Mathematics grade 11 textbook for you if you are a student or your friends or just tell the availability of such textbooks in the Google Play Store. Fuad, in Jigjiga, Somali REB Planning Directorate Director had developed this app for Ethiopian students in Ethiopia and abroad.

11.2: Time on the Swing (20 minutes)

Activity

In this activity, students are given the same four graphs they saw in the warm-up and four descriptions of functions and are asked to match them. All of the functions share the same context. Students then use these features to reason about likely domain and range of each function.

To make the matches, students analyze and interpret features of the graphs, looking for and making use of structure in the situation and in the graphs (MP7). Here are some possible ways students may reason about each match:

• The swing goes up and down while the child is swinging, so D could be a graph for function (h) .
• The time left on the swing decreases as the time on the swing increases, so A is a possible graph for function (r) .
• The distance of the child from the top beam of the swing doesn't change as long as the child is on the swing, so C is a possible graph for function (d) .
• The total number of times the swing is pushed must be a counting number and cannot be fractional. Graph B has multiple pieces and each one could represent the total number of push for certain intervals of time, so B is a possible graph of (s) .

Students reason quantitatively and abstractly as they connect verbal and graphical representations of functions and as they think about the domain and range of each function (MP2).

Launch

Ask students to imagine a child getting on a swing, swinging for 30 seconds, and then getting off the swing. Explain that they will look at four functions that can be found in this situation. Their job is to match verbal descriptions and graphs that define the same functions, and then to think about reasonable domain and range for each function. Tell students they will need additional information for the last question.

Arrange students in groups of 2. Give students a few minutes of quiet time to think about the first two questions, and then time to discuss their thinking with their partner. Follow with a whole-class discussion.

Invite students to share their matching decisions and explanations as to how they know each pair of representations belong together. Make sure that students can offer an explanation for each match, including for their last pair (other than because the description and the graph are the only pair left). See some possible explanations in the Activity Narrative.

Next, ask students to share the points that they think would be helpful for determining the domain and range of each function. If students gesture to the intercepts, a maximum, or a minimum on a graph but do not use those terms to refer to points, ask them to use mathematical terms to clarify what they mean.

Here is the information students will need for the last question. Display it for all to see, or provide it as requested. If the requested information is not shown or cannot be reasoned from what is available, ask students to request a different piece of information.

• The child is given 30 seconds on the swing.
• While the child is on the swing, an adult pushes the swing a total of 5 times.
• The swing is 1.5 feet (18 inches) above ground.
• The chains that hold the seat and suspend it from the top beam are 7 feet long.
• The highest point that the child swings up to is 4 feet above the ground.

If time is limited, ask each partner to choose two functions (different than their partner's) and write the domain and range only for those functions.