Dr. Mark V. Sapir

Orthogonal complements, orthogonal bases

Let V be a subspace of a Euclidean vector space W. Then the set V^c of all vectors w in W which are orthogonal to all vectors from V is called the orthogonal complement of V.

Theorem. Let V^c be the orthogonal complement of a subspace V in a Euclidean vector space W. Then the following properties hold.

V^c is a subspace.
V intersect with V^c is {0}.
Every element w in W is uniquely represented as a sum v+v' where v is in V, v' is in V^c.
If W is a finite dimensional space then dim(V^c)+dim(V)=dim(W).
(V^c)^c=V.

We shall return to this theorem later.

Orthogonal complements and systems of linear equations

The following result shows an important connection between orthogonal complements and systems of linear equations.

Theorem. Let V be a subspace in Rⁿ spanned by vectors s₁=(a₁₁,...,a_1n), s₂=(a₂₁,...,a_2n),..., s_k=(a_k1,...,a_kn). Then the following conditions for a vector v'=(x₁,...,x_n) are equivalent:

v' is in the orthogonal complement V^c of V.
v' is orthogonal to s₁,...,s_n.
(x₁,...,x_n) is a solution of the system of linear equations:

a₁₁ x₁ + ... + a_1n x_n = 0
a₂₁ x₁ + ... + a_2n x_n = 0
...............................
a_k1 x₁ + ... + a_kn x_n = 0

Thus V^c consists of all solutions of this system of equations.

The proof is left as an exercise.

Orthogonal bases. The Gram-Schmidt algorithm

A basis of a Euclidean vector space is called orthogonal if the vectors in this basis are pairwise orthogonal.

A basis of a Euclidean vector space is called orthonormal if it is orthogonal and each vector has norm 1.

Given an orthogonal basis {v₁,...,v_n}, one can get an orthonormal basis by dividing each v_i by its length: {v₁/||v₁||,...,v_n/||v_n||}.

Suppose that we have a (not necessarily orthogonal) basis {s₁,...,s_n} of a Euclidean vector space V. The next procedure, called the Gram-Schmidt algorithm, produces an orthogonal basis {v₁,...,v_n} of V. Let

v₁=s₁

We shall find v₂ as a linear combination of v₁ and s₂: v₂=s₂+xv₁. Since v₂ must be orthogonal to v₁, we have:

0=v₂v₁=(s₂+xv₁)v₁=s₂v₁ + x<v₁,v₁>.
Hence

x=-<s₂,v₁>/<v₁,v₁>,

v₂=s₂-(<s₂,v₁>/<v₁,v₁>) v₁

Next we can find v₃ as a linear combination of s₃, v₁ and v₂. A similar calculation gives that

v₃=s₃ - (<s₃,v₁>/<v₁,v₁>) v₁ - (<s₃,v₂>/<v₂,v₂>) v₂.

Continuing in this manner, we can get the formula for v_i+1:

v_i=s_i - (<s_i,v₁>/<v₁,v₁>) v₁ - (<s_i,v₂>/<v₂,v₂>) v₂-...-(<s_i,v_i-1>)/<v_i-1,v_i-1>) v_i-1

By construction, the set of vectors {v₁,...,v_n} is orthogonal. None of these vectors is 0. Indeed, if v_i were equal to 0 then s_i, v₁,...,v_i-1 would be linearly dependent, which would imply that s₁,...,s_i are linearly dependent (replace each v_j as a linear combination of s₁,...,s_j), which is impossible since {s₁,...,s_n} is a basis. This implies that {v₁,...,v_n} are linearly independent. Indeed, the following general theorem holds.

Theorem. Every set of pairwise orthogonal non-zero vectors is linearly independent.

Proof. By contradiction let {v₁,...,v_n} be a linearly dependent set of pairwise orthogonal non-zero vectors. Then by the theorem about linearly independent sets:

x₁v₁+...+x_nv_n = 0

for some numbers x_i not all of which are zeroes. Suppose that x_i is not 0. Take the dot product of the last equality with v_i:

x₁<v₁,v_i>+...+x_i<v_i,v_i>+...+x_n<v_n,v_i>=0

Since v_i is orthogonal to every vector v_j except v_i itself, each dot product <v_j,v_i> is 0 except <v_i,v_i>. So

x_i<v_i,v_i>=0

Since <v_i,v_i> is not 0 (because v_i is not zero), we conclude that x_i=0. This is a contradiction since we assumed that x_i is not 0.

Thus {v₁,...,v_n} is a linearly independent set of vectors in the Euclidean vector space V. Since dim(V)=n, by the theorem about dimension, this set is a basis in V. Therefore the Gram-Schmidt procedure gives an orthogonal basis of V.

If a basis {v₁,...,v_n} is orthogonal, then it is very easy to find the coordinates of a vector a in this basis. Indeed, suppose that a=x₁v₁+...+x_nv_n. If we multiply this equality by v_i, we get:

<a,v_i>=x_i<v_i,v_i>

(all other terms are 0 because <v_i,v_j>=0 if I is not equal to j). Thus

x_i=<a_i,v_i>/<v_i,v_i>

This is the formula for the i-th coordinate of a (i=1,2...,n). Notice that if the basis is orthonormal, the formula is even easier:

x_i=<a_i,v_i>

because in this case v_iv_i=||v_i||²=1.

Now we can return to the theorem about orthogonal complements. Here is its formulation:

Theorem. Let V^c be the orthogonal complement of a subspace V in a Euclidean vector space W. Then the following properties hold.

V^c is a subspace.
V intersect with V^c is {0}.
Every element w in W is uniquely represented as a sum v+v' where v is in V, v' is in V^c.
If W is a finite dimensional space then dim(V^c)+dim(V)=dim(W).
(V^c)^c=V.

Proof. We leave 1 and 2 as exercises.

3,4. Take any basis {s₁,...,s_k} of V. By the theorem about dimension, one can add several vectors s_k+1,..., s_n and get a basis of W. Let us apply the Gram-Schmidt method to the basis {s₁,...,s_n} and get an orthogonal basis {v₁,...,v_n} of W. Then by construction, first k vectors v₁,...,v_k belong to V (they are linear combinations of s₁,...,s_k). By the theorem about dimension, v₁,...,v_k form a basis of V. Let us show that v_k+1,...,v_n form a basis of V^c. Indeed, each of these vectors is orthogonal to each of v₁,...,v_k. Therefore each of v_k+1,...,v_n is orthogonal to every linear combination of v₁,...,v_k, So v_k+1,...,v_n are orthogonal to V. Thus these vectors belong to V^c. They are linearly independent by the theorem saying that every orthogonal set of non-zero vectors is linearly independent. Let w be any vector in V^c. Then

w=x₁v₁+...+x_kv_k+...+x_nv_n

Multiplying this equality by v₁,...,v_k, we get that x₁,...,x_k are 0. So every vector in V^c is a linear combination of v_k+1,...,v_n. Thus v_k+1,...,v_n is a basis of V^c. This immediately gives us the property 4.

Take any vector w in W. Then w=x₁v₁+...+x_nv_n=(x₁v₁+...+x_kv_k)+(x_k+1v_k+1+...+x_nv_n). So every vector in W is a sum of a vector in V and a vector in V^c.

Let us prove that this representation is unique. Suppose (by contradiction) that

v+v'=u+u'

where u and u are from V and u' and v' are from V^c and either u is not equal to u' or v is not equal to v'. Then we can deduce that

v-u=u'-v'

Notice that since V and V^c are subspaces, v-u belongs to V and u'-v' belongs to V^c. Let us multiply both sides of this equality by v-u. We get:

<(v-u),(v-u)>=<(u'-v'),(v-u)>

The right hand side of this equality is 0 because u'-v' is orthogonal to v-u (and to any other vector from V). So (v-u)(v-u)=0. By a property of dot products, we conclude that v-u=0, u=v. Similarly we get v'=u'. This contradiction completes the proof of part 3.

5. It is clear that (V^c)^c contains V. On the other hand by part 4, dim((V^c)^c)=dim(V)=dim(W)-dim(V^c). Therefore V=(V^c)^c.

Projections on subspaces, distance from a vector to a subspace

The theorem about orthogonal complements tells us that if V is a subspace of a Euclidean vector space W and w is a vector from W then w=v+v' for some v in V and v' in the orthogonal complement V^c of V. We also know that this representation of w is unique. The vector v is then called the projection of w onto V; the vector v' is called the normal component of w.

The Gram-Schmidt procedure gives us the formula for the projection and the normal component. If v₁,...,v_k is an orthogonal basis of the subspace V then

          <w,v₁>        <w,v₂>              <w,v_k>
     V = ---------v₁ + --------- v₂+ ... + ----------v_k
         <v₁,v₁>        <v₂,v₂>             <v_k,v_k>

v'=w - v

Indeed, v belongs to V because each v_i is in V and V is closed under linear combinations. And it is easy to check (exercise) that v'=w-v is orthogonal to each of v_i. This implies that v' is orthogonal to every vector in V because vectors in V are linear combinations of v₁,...,v_k ( check that if a vector p is orthogonal to vectors q₁,...,q_n, then P is orthogonal to any linear combination of q₁,...,q_n).

The theorem about orthogonal complements allows us to find distances from vectors to subspaces in any Euclidean vector space.

If S is a set in a Euclidean vector space W and W is a vector in W then the distance from W to S in W is the smallest distance between W and vectors in S, that is min(dist(w,s)), s in S.

Theorem. Let V be a subspace in a Euclidean vector space W and let w be a vector from W. Let w=v+v' where v is the projection of w onto V and v' is the normal component (as in the theorem about orthdogonal complements). Then ||v'|| is the distance from w to V and v is the closest to w vector in V.

Applications to systems of linear equations. Least squares solutions of systems of linear equations

Suppose that a system of linear equations Av=b with the M by n matrix of coefficients A does not have a solution. In this case we can look for a vector V for which Av is closest possible to b. Such a vector V is called a least squares solution of the system Av=b. The term is motivated by the fact that if V is a least squares solution of Av=b, then the sum of squares of coordinates (the square of the norm) of Av-b must be minimal.

The set V of vectors of the form Av is the range of the linear transformation with the standard matrix A, so it is the column space of the matrix A. In fact if V=(x₁,...,x_n) then

Av=x₁c₁+x₂c₂+...+x_nc_n

Where c₁,..., c_n are column vectors of the matrix A. Thus we need to find the vector p in V such that the distance from b to p is the smallest. >From the theorem about distances from a vector and a subspace we know that p is the projection of b onto V. Thus in order to find v we need to execute the following procedure.

Find an orthogonal basis of the column space V of the matrix a.

Find the projection p of b onto V.

Represent p as a linear combination of the columns c₁,...,c_n of the matrix A. Then the coefficients of this linear combination form the vector v.

There exists an alternative procedure of finding a least squares solution. Notice that V is a least squares solution of the system Av=b if and only if Av-b is orthogonal to V (the column space of the matrix A). By the theorem about orthogonal complements in Rⁿ a vector z is orthogonal to V if and only if z is a solution of the system A^T z=0 where A^T is the transpose of A. Thus v is a least squares solution of Av=b if and only if v is the solution of the system A^T(Av-b)=0 or equivalently

A^TAv = A^Tb

Thus in order to find a least squares solution of the system Av=b one needs to solve another system of linear equations (with the matrix of coefficients A^TA).

There exists yet another procedure of finding a least squares solution of a system of linear equations is Av=b. We need to minimize the distance dist(b, Av). This distance is a function in n variables x₁,...,x_n. Thus our problem is just the problem of finding a minimum of a function in n variables. In fact since the formula for dist(b, Av) contains a square root, it is more convenient to minimize the function f(v)=dist(b, Av)² which is just a quadratic polynomial. It can be done by finding the gradient of this function and solving the system of equations grad(f)=0.

Change of basis

We know that vectors in different bases may have different coordinates, thus when we deal with vector spaces we often need to change a basis to get better coordinates for the vectors we are working with.

Let B={v₁,...,v_n} and B'={s₁,...,s_n} be bases of a vector space V. Then every vector from B is a linear combination of vectors from B':

v₁ = a₁₁ s₁ +...+ a_1n s_n
..............................
v_n = a_n1 s₁ +...+ a_n1 s_n

Consider the following matrix A:

[ a₁₁	a₂₁	...	a_n1 ]
[ a₁₂	a₂₂	...	a_n2 ]
.....................
[ a_1n	a_2n	...	a_nn ]

This matrix is called the transition (transformation) matrix from B to B'. Take any vector w in V. Suppose that w has coordinates (x₁,...,x_n) in the basis B. This means that w=x₁v₁+...x_nv_n. By substituting the expressions of v_i into this equality, we can get the expression of w as a linear combination of vectors from B'. The coefficients in this linear combination are the coordinates of w in the basis B'. It is easy to compute that the vector of coordinates that we get will be equal to Av where v=(x₁,...,x_n). Thus in order to get the coordinate vector v' of w in the basis B', one needs to multiply the transition matrix A from B to B' by the coordinate vector v of w in the basis B.:

v'=Av
It is clear that if A' is the transition matrix from B' to B then A'v'=v. Thus AA'v=v for every vector v. This means that AA' is the identity matrix, so A is invertible. Thus the transition matrix from B to B' is invertible and the inverse matrix is the transition matrix from B' to B.

Matrices of linear operators in finite dimensional spaces

Let B={b₁,...,b_n} be a basis in an n-dimensional vector space V. For every vector v in V let [v]_B denote the column vector of coordinates of v in the basis B.

Let T be a linear operator in V. Since T(b₁),...,T(b_n) are in V, each of these vectors is a linear combination of vectors from B. Consider the n by n matrix [T]_B whose columns are the column vectors of coordinates [T(b₁)]_B,..., [T(b_n)]_B. This matrix is called the matrix of the linear operator T in the basis B.

Recall that we constructed the standard matrix of a linear operator T in Rⁿ in a similar manner: we took the standard basis in Rⁿ, v₁=(1,0,...,0), ..., v_n=(0,0,...,1), and then the columns of the standard matrix of T were the images T(v₁),...,T(v_n) (which are equal to the coordinate vectors of T(v₁),...,T(v_n) in the standard basis). The theorem about characterization of linear transformations from R^m to Rⁿ tells us that the image of every vector v from R^m under the transformation T is equal to the product of the standard matrix of T and the (column) vector v. Almost exactly the same argument proves the following statement.

Theorem. If [T]_B is the matrix of a linear operator T in a vector space V then for every vector v in V we have:

[T(v)]_B = [T]_B*[v]_B.

In other words: the coordinate vector of the image of v is equal to the product of the matrix [T]_B and the coordinate vector of v.

The proof is left as an exercise.

Example. Let T be an operator in R² with the standard matrix

[ 1	2 ]
[ 2	1 ]

Let B be the basis consisting of two vectors b₁=(1,1) and b₂=(1,2) ( prove that it is a basis). Let us construct the matrix [T]_B of the operator T in the basis B. By definition, the columns of the matrix [T]_B are the coordinate vectors of T(b₁) and T(b₂) in the basis B. It is easy to compute that T(b₁)=(3,3)=3b₁+0b₂. So the coordinate vector of T(b₁) in the basis B is (3,0). Now T(b₂)=(5,4)=6b₁-b₂. So the coordinate vector of T(b₂) in the basis B is (6, -1). Thus the matrix of the operator T in the basis B is

[ 3	6 ]
[ 0	-1 ]

Matrices of linear operators in different bases

The previous example shows that a linear operator can have different matrices in different bases. The goal (very important in many applications of linear algebra to mathematics and physics) is to find a basis in which the matrix of a given linear operator is as simple as possible.

First of all let us find out what happens to the matrix of an operator when we change the basis. Let T be a linear operator in a vector space V and let B₁ and B₂ be two bases in V. Then we have the transition matrix M(B₁,B₂) from B₁ to B₂ and the inverse transition matrix M(B₂,B₁) from B₂ to B₁. We know that for every vector V in V we have:

[T(v)]_B₁ = [T]_B₁*[v]_B₁.

We also know that

[v]_B₁ = M(B₂,B₁)*[v]_B₂ and [T(v)]_B₁ = M(B₂,B₁)*[T(v)]_B₂

Using these formulas we get:

M(B₂, B₁)*[T(v)]_B₂ = [T]_B₁*M(B₂,B₁)*[v]_B₂

Multiplying by the inverse of M(B₂,B₁) (that is M(B₁,B₂)), we get:

[T(v)]_B₂ = M(B₂,B₁)^-1 * [T]_B₁ * M(B₂,B₁)*[v]_B₂.

This means that [T]_B₂, the matrix of the operator T in the basis B₂, is equal to M(B₂,B₁)^-1 * [T]_B₁ * M(B₂,B₁). This is the connection we were looking for.

Two matrices C and D are called similar if there exists an invertible matrix M such that

D = M^-1*C*M.

Thus we can say that matrices of the same operator in different bases are similar.

Eigenvectors and eigenvalues

The simplest possible matrices are diagonal. When is it possible to find a basis in which the matrix of a given operator is diagonal?

Suppose that an operator T in a vector space V has a diagonal matrix [T]_B in a basis B = {b₁, b₂, ..., b_n}:

[ a₁	0	0	0	...	0 ]
[ 0	a₂	0	0	...	0 ]
[ 0	0	a₃	0	...	0 ]
......................
[ 0	0	0	0	...	a_n ]

Then T(b₁) = a₁b₁. Indeed, b₁ has coordinate vector (1,0,...,0) in the basis B, so T(b₁) has coordinate vector [T]_B*(1,...,0) which is (a₁,0,...,0). Thus T(b₁)=a₁b₁. Similarly, T(b₂)=a₂b₂,..., T(b_n)=a_nb_n. This leads to the following important definition. A non-zero vector V is called an eigenvector of a linear operator T with the eigenvalue a if T(v) = av (a is a number).

We see that if an operator has diagonal matrix in some basis, this basis must consist of eigenvectors of this operator; the numbers on the diagonal of this diagonal matrix are the corresponding eigenvalues.

Examples. Let V=R². We have considered some types of linear operators in R². Let us check which of these operators have bases of eigenvectors.

1. Dilations. a dilation operator takes every vector V to kv for some fixed number k. Thus every vector is an eigenvector of the dilation operator. So every basis is a basis of eigenvectors. Notice that the matrix of the dilation operator in any basis is

[ k	0 ]
[ 0	k ]

2. Projection on line L. Any non-zero vector b₁ which is parallel to L is an eigenvector of the projection with the eigenvalue 1. Any non-zero vector b₂ which is perpendicular to L is an eigenvector with the eigenvalue 0. These two vectors form a basis of eigenvectors of the projection. And the matrix of the projection in this basis has the following form:

[ 1	0 ]
[ 0	0 ]

3. Reflection about line L. This is similar to the previous case. Any vector b₁ which is parallel to L is an eigenvector with eigenvalue 1 and any vector b₂ which is perpendicular to L is an eigenvector with eigenvalue -1. Thus b₁ and b₂ form a basis of eigenvectors of the projection and the matrix of the projectioon in this basis is:

[ 1	0 ]
[ 0	-1 ]

4. Rotation through some angle (not equal to 0 and 180 degrees). It is easy to see that the rotation does not have eigenvectors: there are no vectors which are dilated by the rotation. Thus there are no bases of R² in which the matrix of the rotation is diagonal.

How to find eigenvectors and eigenvalues

Let T be a linear operator in an n-dimensional vector space V and let B be any basis in V. Suppose that v is an eigenvector with the eigenvalue a. Then T(v)=av. This means that the coordinate vectors of T(v) and v in the basis B satisfy a similar condition:

[T(v)]_B=a[v]_B

We can rewrite this equality using the matrix A of T in the basis B:

[T]_B*[v]_B=a[v]_B

Let us denote [T]_B by A and [v]_B by w. Then we have:

A*w=aw.

Recall that here w is an n-vector. A non-zero vector w is called an eigenvector of a matrix A with the eigenvalue a if A*w=aw. Thus the problem of finding eigenvectors and eigenvalues of linear operators in n-dimensional spaces is reduced to the problem of finding eigenvectors and eigenvalues of their matrices.

We can rewrite the equality A*w=aw in the following form:

A*w=aI*w

where I is the identity matrix. Moving the right hand side to the left, we get:

(A-aI)*w=0

This is a homogeneous system of linear equations. If the matrix A has an eigenvector W then this system must have a non-trivial solution (w is not 0 !). By the second theorem about invertible matrices this is equivalent to the condition that the matrix A-aI is not invertible. Now by the third theorem this is equivalent to the condition that det(A-aI) is not zero. Thus a matrix A has an eigenvector with an eigenvalue a if and only if det(A-aI)=0.

Notice that det(A-aI) is a polynomial with the unknown a. This polynomial is called the characteristic polynomial of the matrix a.

Thus in order to find all eigenvectors of a matrix A, one needs to find the characteristic polynomial of A, find all its roots (=eigenvalues) and for each of these roots a, find the solutions of the homogeneous system (A-aI)*w=0.

Example.
Let T be the linear operator in R² with the standard matrix

[ 1	2 ]
[ 2	1 ]

The characteristic polynomial det(A-aI) is (1-a)²-4. The roots are a=-1 and a=3. These are the eigenvalues of the operator T.

Let a=-1. Then the system of equations (A-aI)*w=0 has the form

2x+2y=0
2x+2y=0

where w=(x,y). The general solution is:

x=-t
y=t

This solution gives the set of all eigenvectors with the eigenvalue -1. It is a subspace spanned by the vector b₁=(-1,1).

Now let a=3. Then the system (A-aI)*w=0 has the form

-2x+2y=0
2x-2y=0

The general solution is:

						x = t 

						y = t

Thus the set of eigenvectors with eigenvalue 3 forms a subspace spanned by the vector (1,1).

Notice that the vectors (-1,1) and (1,1) form a basis in R² because they are non-proportional and the dimension of R² is 2. The matrix of the operator T in this basis is diagonal:

[ -1	0 ]
[ 0	3 ]