 # Dr. Mark V. Sapir

## Orthogonal complements, orthogonal bases Theorem. Let Vc be the orthogonal complement of a subspace V in a Euclidean vector space W. Then the following properties hold.

1. Vc is a subspace.
2. V intersect with Vc is {0}.
3. Every element w in W is uniquely represented as a sum v+v' where v is in V, v' is in Vc.
4. If W is a finite dimensional space then dim(Vc)+dim(V)=dim(W).
5. (Vc)c=V.

We shall return to this theorem later.

## Orthogonal complements and systems of linear equations

The following result shows an important connection between orthogonal complements and systems of linear equations. Theorem. Let V be a subspace in Rn spanned by vectors s1=(a11,...,a1n), s2=(a21,...,a2n),..., sk=(ak1,...,akn). Then the following conditions for a vector v'=(x1,...,xn) are equivalent:

1. v' is in the orthogonal complement Vc of V.
2. v' is orthogonal to s1,...,sn.
3. (x1,...,xn) is a solution of the system of linear equations:

a11 x1 + ... + a1n xn = 0
a21 x1 + ... + a2n xn = 0
...............................
ak1 x1 + ... + akn xn = 0

Thus Vc consists of all solutions of this system of equations. The proof is left as an exercise.

## Orthogonal bases. The Gram-Schmidt algorithm

Given an orthogonal basis {v1,...,vn}, one can get an orthonormal basis by dividing each vi by its length: {v1/||v1||,...,vn/||vn||}.

Suppose that we have a (not necessarily orthogonal) basis {s1,...,sn} of a Euclidean vector space V. The next procedure, called the Gram-Schmidt algorithm, produces an orthogonal basis {v1,...,vn} of V. Let

v1=s1

We shall find v2 as a linear combination of v1 and s2: v2=s2+xv1. Since v2 must be orthogonal to v1, we have:

0=v2v1=(s2+xv1)v1=s2v1 + x<v1,v1>.

Hence

x=-<s2,v1>/<v1,v1>,

so

v2=s2-(<s2,v1>/<v1,v1>) v1

Next we can find v3 as a linear combination of s3, v1 and v2. A similar calculation gives that

v3=s3 - (<s3,v1>/<v1,v1>) v1 - (<s3,v2>/<v2,v2>) v2.

Continuing in this manner, we can get the formula for vi+1:

vi=si - (<si,v1>/<v1,v1>) v1 - (<si,v2>/<v2,v2>) v2-...-(<si,vi-1>)/<vi-1,vi-1>) vi-1

By construction, the set of vectors {v1,...,vn} is orthogonal. None of these vectors is 0. Indeed, if vi were equal to 0 then si, v1,...,vi-1 would be linearly dependent, which would imply that s1,...,si are linearly dependent (replace each vj as a linear combination of s1,...,sj), which is impossible since {s1,...,sn} is a basis. This implies that {v1,...,vn} are linearly independent. Indeed, the following general theorem holds.

Theorem. Every set of pairwise orthogonal non-zero vectors is linearly independent. Proof. By contradiction let {v1,...,vn} be a linearly dependent set of pairwise orthogonal non-zero vectors. Then by the theorem about linearly independent sets:

x1v1+...+xnvn = 0

for some numbers xi not all of which are zeroes. Suppose that xi is not 0. Take the dot product of the last equality with vi:

x1<v1,vi>+...+xi<vi,vi>+...+xn<vn,vi>=0

Since vi is orthogonal to every vector vj except vi itself, each dot product <vj,vi> is 0 except <vi,vi>. So

xi<vi,vi>=0

Since <vi,vi> is not 0 (because vi is not zero), we conclude that xi=0. This is a contradiction since we assumed that xi is not 0. Thus {v1,...,vn} is a linearly independent set of vectors in the Euclidean vector space V. Since dim(V)=n, by the theorem about dimension, this set is a basis in V. Therefore the Gram-Schmidt procedure gives an orthogonal basis of V.

If a basis {v1,...,vn} is orthogonal, then it is very easy to find the coordinates of a vector a in this basis. Indeed, suppose that a=x1v1+...+xnvn. If we multiply this equality by vi, we get:

<a,vi>=xi<vi,vi>

(all other terms are 0 because <vi,vj>=0 if I is not equal to j). Thus

xi=<ai,vi>/<vi,vi>

This is the formula for the i-th coordinate of a (i=1,2...,n). Notice that if the basis is orthonormal, the formula is even easier:

xi=<ai,vi>

because in this case vivi=||vi||2=1.

Now we can return to the theorem about orthogonal complements. Here is its formulation: Theorem. Let Vc be the orthogonal complement of a subspace V in a Euclidean vector space W. Then the following properties hold.

1. Vc is a subspace.
2. V intersect with Vc is {0}.
3. Every element w in W is uniquely represented as a sum v+v' where v is in V, v' is in Vc.
4. If W is a finite dimensional space then dim(Vc)+dim(V)=dim(W).
5. (Vc)c=V. Proof. We leave 1 and 2 as exercises.

3,4. Take any basis {s1,...,sk} of V. By the theorem about dimension, one can add several vectors sk+1,..., sn and get a basis of W. Let us apply the Gram-Schmidt method to the basis {s1,...,sn} and get an orthogonal basis {v1,...,vn} of W. Then by construction, first k vectors v1,...,vk belong to V (they are linear combinations of s1,...,sk). By the theorem about dimension, v1,...,vk form a basis of V. Let us show that vk+1,...,vn form a basis of Vc. Indeed, each of these vectors is orthogonal to each of v1,...,vk. Therefore each of vk+1,...,vn is orthogonal to every linear combination of v1,...,vk, So vk+1,...,vn are orthogonal to V. Thus these vectors belong to Vc. They are linearly independent by the theorem saying that every orthogonal set of non-zero vectors is linearly independent. Let w be any vector in Vc. Then

w=x1v1+...+xkvk+...+xnvn

Multiplying this equality by v1,...,vk, we get that x1,...,xk are 0. So every vector in Vc is a linear combination of vk+1,...,vn. Thus vk+1,...,vn is a basis of Vc. This immediately gives us the property 4.

Take any vector w in W. Then w=x1v1+...+xnvn=(x1v1+...+xkvk)+(xk+1vk+1+...+xnvn). So every vector in W is a sum of a vector in V and a vector in Vc.

Let us prove that this representation is unique. Suppose (by contradiction) that

v+v'=u+u'

where u and u are from V and u' and v' are from Vc and either u is not equal to u' or v is not equal to v'. Then we can deduce that

v-u=u'-v'

Notice that since V and Vc are subspaces, v-u belongs to V and u'-v' belongs to Vc. Let us multiply both sides of this equality by v-u. We get:

<(v-u),(v-u)>=<(u'-v'),(v-u)>

The right hand side of this equality is 0 because u'-v' is orthogonal to v-u (and to any other vector from V). So (v-u)(v-u)=0. By a property of dot products, we conclude that v-u=0, u=v. Similarly we get v'=u'. This contradiction completes the proof of part 3.

5. It is clear that (Vc)c contains V. On the other hand by part 4, dim((Vc)c)=dim(V)=dim(W)-dim(Vc). Therefore V=(Vc)c. ## Projections on subspaces, distance from a vector to a subspace

The theorem about orthogonal complements tells us that if V is a subspace of a Euclidean vector space W and w is a vector from W then w=v+v' for some v in V and v' in the orthogonal complement Vc of V. We also know that this representation of w is unique. The vector v is then called the projection of w onto V; the vector v' is called the normal component of w.

The Gram-Schmidt procedure gives us the formula for the projection and the normal component. If v1,...,vk is an orthogonal basis of the subspace V then

```          <w,v1>        <w,v2>              <w,vk>
V = ---------v1 + --------- v2+ ... + ----------vk
<v1,v1>        <v2,v2>             <vk,vk>
```

v'=w - v

Indeed, v belongs to V because each vi is in V and V is closed under linear combinations. And it is easy to check ( exercise) that v'=w-v is orthogonal to each of vi. This implies that v' is orthogonal to every vector in V because vectors in V are linear combinations of v1,...,vk ( check that if a vector p is orthogonal to vectors q1,...,qn, then P is orthogonal to any linear combination of q1,...,qn).

The theorem about orthogonal complements allows us to find distances from vectors to subspaces in any Euclidean vector space. Theorem. Let V be a subspace in a Euclidean vector space W and let w be a vector from W. Let w=v+v' where v is the projection of w onto V and v' is the normal component (as in the theorem about orthdogonal complements). Then ||v'|| is the distance from w to V and v is the closest to w vector in V.    ## Applications to systems of linear equations. Least squares solutions of systems of linear equations

Suppose that a system of linear equations Av=b with the M by n matrix of coefficients A does not have a solution. In this case we can look for a vector V for which Av is closest possible to b. Such a vector V is called a least squares solution of the system Av=b. The term is motivated by the fact that if V is a least squares solution of Av=b, then the sum of squares of coordinates (the square of the norm) of Av-b must be minimal.

The set V of vectors of the form Av is the range of the linear transformation with the standard matrix A, so it is the column space of the matrix A. In fact if V=(x1,...,xn) then

Av=x1c1+x2c2+...+xncn

Where c1,..., cn are column vectors of the matrix A. Thus we need to find the vector p in V such that the distance from b to p is the smallest. >From the theorem about distances from a vector and a subspace we know that p is the projection of b onto V. Thus in order to find v we need to execute the following procedure.

1. Find an orthogonal basis of the column space V of the matrix a.
2. Find the projection p of b onto V.
3. Represent p as a linear combination of the columns c1,...,cn of the matrix A. Then the coefficients of this linear combination form the vector v.

There exists an alternative procedure of finding a least squares solution. Notice that V is a least squares solution of the system Av=b if and only if Av-b is orthogonal to V (the column space of the matrix A). By the theorem about orthogonal complements in Rn a vector z is orthogonal to V if and only if z is a solution of the system AT z=0 where AT is the transpose of A. Thus v is a least squares solution of Av=b if and only if v is the solution of the system AT(Av-b)=0 or equivalently

ATAv = ATb

Thus in order to find a least squares solution of the system Av=b one needs to solve another system of linear equations (with the matrix of coefficients ATA).

There exists yet another procedure of finding a least squares solution of a system of linear equations is Av=b. We need to minimize the distance dist(b, Av). This distance is a function in n variables x1,...,xn. Thus our problem is just the problem of finding a minimum of a function in n variables. In fact since the formula for dist(b, Av) contains a square root, it is more convenient to minimize the function f(v)=dist(b, Av)2 which is just a quadratic polynomial. It can be done by finding the gradient of this function and solving the system of equations grad(f)=0.

## Change of basis

We know that vectors in different bases may have different coordinates, thus when we deal with vector spaces we often need to change a basis to get better coordinates for the vectors we are working with.

Let B={v1,...,vn} and B'={s1,...,sn} be bases of a vector space V. Then every vector from B is a linear combination of vectors from B':

v1 = a11 s1 +...+ a1n sn
..............................
vn = an1 s1 +...+ an1 sn

Consider the following matrix A:

 [ a11 a21 ... an1 ] [ a12 a22 ... an2 ] ..................... [ a1n a2n ... ann ]

This matrix is called the transition (transformation) matrix from B to B'. Take any vector w in V. Suppose that w has coordinates (x1,...,xn) in the basis B. This means that w=x1v1+...xnvn. By substituting the expressions of vi into this equality, we can get the expression of w as a linear combination of vectors from B'. The coefficients in this linear combination are the coordinates of w in the basis B'. It is easy to compute that the vector of coordinates that we get will be equal to Av where v=(x1,...,xn). Thus in order to get the coordinate vector v' of w in the basis B', one needs to multiply the transition matrix A from B to B' by the coordinate vector v of w in the basis B.:

v'=Av

It is clear that if A' is the transition matrix from B' to B then A'v'=v. Thus AA'v=v for every vector v. This means that AA' is the identity matrix, so A is invertible. Thus the transition matrix from B to B' is invertible and the inverse matrix is the transition matrix from B' to B.

## Matrices of linear operators in finite dimensional spaces

Let B={b1,...,bn} be a basis in an n-dimensional vector space V. For every vector v in V let [v]B denote the column vector of coordinates of v in the basis B.

Recall that we constructed the standard matrix of a linear operator T in Rn in a similar manner: we took the standard basis in Rn, v1=(1,0,...,0), ..., vn=(0,0,...,1), and then the columns of the standard matrix of T were the images T(v1),...,T(vn) (which are equal to the coordinate vectors of T(v1),...,T(vn) in the standard basis). The theorem about characterization of linear transformations from Rm to Rn tells us that the image of every vector v from Rm under the transformation T is equal to the product of the standard matrix of T and the (column) vector v. Almost exactly the same argument proves the following statement. Theorem. If [T]B is the matrix of a linear operator T in a vector space V then for every vector v in V we have:

[T(v)]B = [T]B*[v]B.

In other words: the coordinate vector of the image of v is equal to the product of the matrix [T]B and the coordinate vector of v. The proof is left as an exercise.

Example. Let T be an operator in R2 with the standard matrix

 [ 1 2 ] [ 2 1 ]

Let B be the basis consisting of two vectors b1=(1,1) and b2=(1,2) ( prove that it is a basis). Let us construct the matrix [T]B of the operator T in the basis B. By definition, the columns of the matrix [T]B are the coordinate vectors of T(b1) and T(b2) in the basis B. It is easy to compute that T(b1)=(3,3)=3b1+0b2. So the coordinate vector of T(b1) in the basis B is (3,0). Now T(b2)=(5,4)=6b1-b2. So the coordinate vector of T(b2) in the basis B is (6, -1). Thus the matrix of the operator T in the basis B is

 [ 3 6 ] [ 0 -1 ]

## Matrices of linear operators in different bases

The previous example shows that a linear operator can have different matrices in different bases. The goal (very important in many applications of linear algebra to mathematics and physics) is to find a basis in which the matrix of a given linear operator is as simple as possible.

First of all let us find out what happens to the matrix of an operator when we change the basis. Let T be a linear operator in a vector space V and let B1 and B2 be two bases in V. Then we have the transition matrix M(B1,B2) from B1 to B2 and the inverse transition matrix M(B2,B1) from B2 to B1. We know that for every vector V in V we have:

[T(v)]B1 = [T]B1*[v]B1.

We also know that

[v]B1 = M(B2,B1)*[v]B2 and [T(v)]B1 = M(B2,B1)*[T(v)]B2

Using these formulas we get:

M(B2, B1)*[T(v)]B2 = [T]B1*M(B2,B1)*[v]B2

Multiplying by the inverse of M(B2,B1) (that is M(B1,B2)), we get:

[T(v)]B2 = M(B2,B1)-1 * [T]B1 * M(B2,B1)*[v]B2.

This means that [T]B2, the matrix of the operator T in the basis B2, is equal to M(B2,B1)-1 * [T]B1 * M(B2,B1). This is the connection we were looking for.

D = M-1*C*M.

## Eigenvectors and eigenvalues

The simplest possible matrices are diagonal. When is it possible to find a basis in which the matrix of a given operator is diagonal?

Suppose that an operator T in a vector space V has a diagonal matrix [T]B in a basis B = {b1, b2, ..., bn}:

 [ a1 0 0 0 ... 0 ] [ 0 a2 0 0 ... 0 ] [ 0 0 a3 0 ... 0 ] ...................... [ 0 0 0 0 ... an ]

Then T(b1) = a1b1. Indeed, b1 has coordinate vector (1,0,...,0) in the basis B, so T(b1) has coordinate vector [T]B*(1,...,0) which is (a1,0,...,0). Thus T(b1)=a1b1. Similarly, T(b2)=a2b2,..., T(bn)=anbn. This leads to the following important definition. A non-zero vector V is called an eigenvector of a linear operator T with the eigenvalue a if T(v) = av (a is a number).

We see that if an operator has diagonal matrix in some basis, this basis must consist of eigenvectors of this operator; the numbers on the diagonal of this diagonal matrix are the corresponding eigenvalues.

Examples. Let V=R2. We have considered some types of linear operators in R2. Let us check which of these operators have bases of eigenvectors.

1. Dilations. a dilation operator takes every vector V to kv for some fixed number k. Thus every vector is an eigenvector of the dilation operator. So every basis is a basis of eigenvectors. Notice that the matrix of the dilation operator in any basis is

 [ k 0 ] [ 0 k ]

2. Projection on line L. Any non-zero vector b1 which is parallel to L is an eigenvector of the projection with the eigenvalue 1. Any non-zero vector b2 which is perpendicular to L is an eigenvector with the eigenvalue 0. These two vectors form a basis of eigenvectors of the projection. And the matrix of the projection in this basis has the following form:

 [ 1 0 ] [ 0 0 ]

3. Reflection about line L. This is similar to the previous case. Any vector b1 which is parallel to L is an eigenvector with eigenvalue 1 and any vector b2 which is perpendicular to L is an eigenvector with eigenvalue -1. Thus b1 and b2 form a basis of eigenvectors of the projection and the matrix of the projectioon in this basis is:

 [ 1 0 ] [ 0 -1 ]

4. Rotation through some angle (not equal to 0 and 180 degrees). It is easy to see that the rotation does not have eigenvectors: there are no vectors which are dilated by the rotation. Thus there are no bases of R2 in which the matrix of the rotation is diagonal.

## How to find eigenvectors and eigenvalues

Let T be a linear operator in an n-dimensional vector space V and let B be any basis in V. Suppose that v is an eigenvector with the eigenvalue a. Then T(v)=av. This means that the coordinate vectors of T(v) and v in the basis B satisfy a similar condition:

[T(v)]B=a[v]B

We can rewrite this equality using the matrix A of T in the basis B:

[T]B*[v]B=a[v]B

Let us denote [T]B by A and [v]B by w. Then we have:

A*w=aw.

Recall that here w is an n-vector. A non-zero vector w is called an eigenvector of a matrix A with the eigenvalue a if A*w=aw. Thus the problem of finding eigenvectors and eigenvalues of linear operators in n-dimensional spaces is reduced to the problem of finding eigenvectors and eigenvalues of their matrices.

We can rewrite the equality A*w=aw in the following form:

A*w=aI*w

where I is the identity matrix. Moving the right hand side to the left, we get:

(A-aI)*w=0

This is a homogeneous system of linear equations. If the matrix A has an eigenvector W then this system must have a non-trivial solution (w is not 0 !). By the second theorem about invertible matrices this is equivalent to the condition that the matrix A-aI is not invertible. Now by the third theorem this is equivalent to the condition that det(A-aI) is not zero. Thus a matrix A has an eigenvector with an eigenvalue a if and only if det(A-aI)=0.

Notice that det(A-aI) is a polynomial with the unknown a. This polynomial is called the characteristic polynomial of the matrix a.

Thus in order to find all eigenvectors of a matrix A, one needs to find the characteristic polynomial of A, find all its roots (=eigenvalues) and for each of these roots a, find the solutions of the homogeneous system (A-aI)*w=0.

Example.
Let T be the linear operator in R2 with the standard matrix

 [ 1 2 ] [ 2 1 ]

The characteristic polynomial det(A-aI) is (1-a)2-4. The roots are a=-1 and a=3. These are the eigenvalues of the operator T.

Let a=-1. Then the system of equations (A-aI)*w=0 has the form

2x+2y=0
2x+2y=0

where w=(x,y). The general solution is:

x=-t
y=t

This solution gives the set of all eigenvectors with the eigenvalue -1. It is a subspace spanned by the vector b1=(-1,1).

Now let a=3. Then the system (A-aI)*w=0 has the form

-2x+2y=0
2x-2y=0

The general solution is:

```						x = t
y = t
```

Thus the set of eigenvectors with eigenvalue 3 forms a subspace spanned by the vector (1,1).

Notice that the vectors (-1,1) and (1,1) form a basis in R2 because they are non-proportional and the dimension of R2 is 2. The matrix of the operator T in this basis is diagonal:

 [ -1 0 ] [ 0 3 ]   