Theorem. Let Vc be the orthogonal complement of a subspace V in a Euclidean vector space W. Then the following properties hold.
We shall return to this theorem later.
The following result shows an important connection between orthogonal complements and systems of linear equations.
Theorem. Let V be a subspace in Rn spanned by vectors s1=(a11,...,a1n), s2=(a21,...,a2n),..., sk=(ak1,...,akn). Then the following conditions for a vector v'=(x1,...,xn) are equivalent:
Thus Vc consists of all solutions of this system of equations.
The proof is left as
an exercise.
Given an orthogonal basis {v1,...,vn}, one can get an orthonormal basis
by dividing each vi by its length: {v1/||v1||,...,vn/||vn||}.
Suppose that we have a (not necessarily orthogonal) basis {s1,...,sn}
of a Euclidean
vector space V. The next procedure, called the Gram-Schmidt algorithm,
produces an orthogonal basis {v1,...,vn} of V. Let
We shall find v2 as a linear combination of v1 and s2:
v2=s2+xv1. Since v2 must be orthogonal to v1, we have:
so
Next we can find v3 as a linear combination of s3, v1 and v2. A similar
calculation gives that
Continuing in this manner, we can get the formula for vi+1:
By construction, the set of vectors {v1,...,vn} is orthogonal. None of these
vectors is 0. Indeed, if vi were equal to
0 then si, v1,...,vi-1 would be linearly
dependent, which would imply that s1,...,si are linearly dependent (replace
each vj as a linear combination of s1,...,sj), which is impossible since
{s1,...,sn} is a basis.
This implies
that {v1,...,vn} are linearly independent. Indeed, the following general
theorem holds.
Theorem. Every set of pairwise orthogonal non-zero vectors is linearly independent.
Proof. By contradiction
let {v1,...,vn}
be a linearly dependent set of pairwise orthogonal non-zero vectors.
Then by the theorem about linearly independent sets:
for some numbers xi not all of which are zeroes. Suppose that xi is not 0. Take the dot product of the last equality with vi:
Since vi is orthogonal to every vector vj except vi itself,
each dot product
<vj,vi> is 0 except <vi,vi>. So
Since <vi,vi> is not 0 (because vi is not zero), we conclude that xi=0.
This is a contradiction since we assumed that xi is not 0.
Thus {v1,...,vn} is a linearly independent set of vectors in the Euclidean vector space V. Since dim(V)=n, by the theorem
about dimension, this set is a basis in V. Therefore the Gram-Schmidt procedure
gives an orthogonal basis of V.
If a basis {v1,...,vn} is orthogonal, then it is very easy to find
the coordinates of a vector a in this basis. Indeed, suppose that
a=x1v1+...+xnvn. If we multiply this equality by vi, we get:
(all other terms are 0 because <vi,vj>=0 if I is not equal to j). Thus
This is the formula for the i-th coordinate of a (i=1,2...,n).
Notice that if the basis is orthonormal, the formula is even easier:
because in this case vivi=||vi||2=1.
Now we can return to the theorem about orthogonal complements. Here is its
formulation:
Theorem. Let Vc be the orthogonal complement of a subspace V in a Euclidean vector space W. Then the following properties hold.
Proof.
We leave 1 and 2
as exercises.
3,4. Take any basis {s1,...,sk} of V. By the
theorem about dimension, one can add several vectors sk+1,..., sn
and get a basis of W. Let us apply the Gram-Schmidt method to the basis
{s1,...,sn} and get an orthogonal basis {v1,...,vn} of W.
Then by construction, first k vectors v1,...,vk belong to V (they are linear
combinations of s1,...,sk). By the theorem
about dimension, v1,...,vk form a basis of V. Let us show that vk+1,...,vn form a basis of Vc. Indeed, each of these vectors is orthogonal to each of
v1,...,vk. Therefore each of vk+1,...,vn is orthogonal to every linear
combination of v1,...,vk, So vk+1,...,vn are orthogonal to V. Thus
these vectors belong to Vc. They are linearly independent by the theorem saying that every orthogonal set of non-zero vectors
is linearly independent. Let w be any vector in Vc. Then
Multiplying this equality by v1,...,vk, we get that x1,...,xk are 0.
So every vector in Vc is a linear combination of vk+1,...,vn. Thus
vk+1,...,vn is a basis of Vc. This immediately gives us the property
4.
Take any vector w in W. Then w=x1v1+...+xnvn=(x1v1+...+xkvk)+(xk+1vk+1+...+xnvn). So every vector in W is a sum of a vector in V and a vector in Vc.
Let us prove that this representation is unique. Suppose (by contradiction)
that
where u and u are from V and u' and v' are from Vc and either
u is not equal to u' or v is not equal to v'. Then we can deduce
that
Notice that since V and Vc are subspaces, v-u belongs to V and u'-v' belongs to Vc. Let us multiply both sides of this equality by v-u. We get:
The right hand side of this equality is 0 because u'-v' is orthogonal to
v-u (and to any other vector from V). So (v-u)(v-u)=0. By a property of dot
products, we conclude that v-u=0, u=v. Similarly we get v'=u'. This contradiction completes the proof of part 3.
5. It is clear that (Vc)c contains V. On the other hand by part 4,
dim((Vc)c)=dim(V)=dim(W)-dim(Vc). Therefore V=(Vc)c.
The theorem about orthogonal complements tells us
that if V is a subspace of a Euclidean vector space W and w is a vector from W
then w=v+v' for some v in V and v' in the orthogonal complement Vc of V. We
also know that this representation of w is unique.
The vector v is then called the projection of w onto V;
the vector v' is called the normal component
of w.
The Gram-Schmidt procedure gives us the formula for the projection and
the normal component. If v1,...,vk is an orthogonal basis of the subspace V
then
<w,v1> <w,v2> <w,vk>
V = ---------v1 + --------- v2+ ... + ----------vk
<v1,v1> <v2,v2> <vk,vk>
Indeed, v belongs to V because each vi is in V and V is closed under
linear combinations. And it is easy to check (exercise) that v'=w-v is orthogonal to each of vi. This implies
that v' is orthogonal to every vector in V because vectors in V are linear
combinations of v1,...,vk (
check that if a vector p is orthogonal to vectors q1,...,qn, then P is
orthogonal to any linear combination of q1,...,qn).
The theorem about orthogonal complements allows
us to find distances from vectors to subspaces in any Euclidean vector space.
Theorem. Let V be a subspace in a Euclidean vector space W and let w be
a vector from W. Let w=v+v' where v is the projection of w onto V and v' is
the normal component (as in the theorem about orthdogonal complements). Then ||v'|| is the distance from w to V and v is the closest
to w vector in V.
Suppose that a system of linear equations Av=b with the M by n
matrix of coefficients A does not have a solution.
In this case we can
look for a vector V for
which Av is closest possible to b. Such a vector V is called a least squares solution of the system Av=b. The term is motivated by the fact that
if V is a least squares solution of Av=b,
then the sum of squares of coordinates (the square of the norm)
of Av-b must be minimal.
The set V of vectors of the form Av is the range
of the linear transformation with the standard matrix A, so it is the
column space of the matrix A.
In fact if V=(x1,...,xn) then
Where c1,..., cn are column vectors of the matrix A.
Thus we need to find
the vector p in V such that the distance from b to p is the smallest.
>From the theorem about distances from a vector
and a subspace we know that p is the projection of b onto
V. Thus in order
to find v we need to execute the following procedure.
There exists an alternative procedure of finding a least squares
solution. Notice that V is a least squares solution of the system Av=b
if and only if Av-b is orthogonal to V (the column space of the matrix A).
By the theorem about orthogonal complements in
Rn a vector z is orthogonal to V if and only if z is a solution of the
system AT z=0 where AT is the transpose of A.
Thus v is a least squares solution of Av=b if and only if
v is the
solution of the system AT(Av-b)=0 or equivalently
Thus in order to find a least squares solution of the system Av=b
one needs to solve another system of linear equations (with the
matrix of coefficients ATA).
There exists yet another procedure of finding a least squares
solution of a system of linear equations is Av=b.
We need to minimize the distance
dist(b, Av). This distance is a function in n variables x1,...,xn. Thus our
problem is just the problem of finding a minimum of a function in n variables.
In fact since the formula for dist(b, Av) contains a square root, it is more
convenient to minimize the function f(v)=dist(b, Av)2 which is just a quadratic
polynomial. It can be done by finding the gradient of this function
and solving the system of equations grad(f)=0.
We know that vectors in different bases may have different coordinates,
thus when we deal with vector spaces we often need to change a basis to
get better coordinates for the vectors we are working with.
Let B={v1,...,vn} and B'={s1,...,sn} be bases of a vector space V.
Then every vector from B is a linear combination of vectors from B':
Consider the following matrix A:
[ a11 | a21 | ... | an1 ] |
[ a12 | a22 | ... | an2 ] |
..................... | |||
[ a1n | a2n | ... | ann ] |
This matrix is called the
transition (transformation) matrix from B to B'.
Take any vector w in V. Suppose that w has coordinates (x1,...,xn) in
the basis B. This means that w=x1v1+...xnvn. By substituting
the expressions of vi into this equality, we can get the expression of
w as a linear combination of vectors from B'. The coefficients in
this linear combination are the coordinates of w in the basis B'. It is easy to
compute that the vector of coordinates that we get will be equal to Av
where v=(x1,...,xn). Thus in
order to get the coordinate vector v' of w
in the
basis B', one needs to multiply the transition matrix A
from B to B' by
the coordinate vector v of w in the basis B.:
Let B={b1,...,bn} be a basis in an n-dimensional vector space V.
For every vector v in V let [v]B denote the column vector of coordinates
of v in the basis B.
Recall
that we constructed the standard
matrix of a linear operator T in Rn in a similar manner: we took the standard
basis in Rn, v1=(1,0,...,0), ..., vn=(0,0,...,1), and then the columns of the standard
matrix of T were the images T(v1),...,T(vn) (which are equal to the coordinate
vectors of T(v1),...,T(vn) in the standard basis).
The theorem
about characterization of linear transformations from Rm to Rn tells us
that the image of every vector v from Rm under the transformation T is equal to
the product of the standard matrix of T and the (column) vector v. Almost
exactly the same argument proves the following statement.
Theorem. If [T]B is the matrix of a linear operator T in a vector space V
then for every vector v in V we have:
In other words: the coordinate vector of the image of v
is equal to the
product of the matrix [T]B and the coordinate vector
of v.
The proof is left as an exercise.
Example. Let T be an operator in R2 with the standard matrix
[ 1 | 2 ] |
[ 2 | 1 ] |
Let B be the basis consisting of two vectors b1=(1,1) and b2=(1,2)
( prove that it is a basis).
Let us construct the matrix [T]B of the operator T in the basis B.
By definition, the columns of the matrix [T]B are the coordinate vectors of
T(b1) and T(b2) in the basis B. It is easy to compute that
T(b1)=(3,3)=3b1+0b2. So the coordinate vector of T(b1) in the basis B
is (3,0).
Now T(b2)=(5,4)=6b1-b2. So the coordinate vector of T(b2) in the basis B
is (6, -1). Thus the matrix of the operator T in the basis B is
[ 3 | 6 ] |
[ 0 | -1 ] |
The previous example shows that a linear operator can have different
matrices in different bases. The goal (very important in many applications of linear algebra
to mathematics and physics) is to find a basis in which the matrix of
a given linear operator is as simple as possible.
First of all let us find out what happens to the matrix of an operator
when we change the basis. Let T be a linear operator in a vector space V
and let B1 and B2 be two bases in V. Then we have the transition matrix
M(B1,B2) from B1 to B2 and the inverse transition matrix M(B2,B1)
from B2 to B1. We know that for every vector V in V we have:
We also know that
Using these formulas we get:
Multiplying by the inverse of M(B2,B1) (that is M(B1,B2)), we get:
This means that [T]B2, the matrix of the operator T in the basis B2,
is equal to M(B2,B1)-1 * [T]B1 * M(B2,B1). This is the connection we were
looking for.
Two matrices C and D are called similar if
there exists an invertible matrix M such that
Thus we can say that
The simplest possible matrices are diagonal. When is it possible to find a basis in which the matrix
of a given operator is diagonal?
Suppose that an operator T in a vector space V has a diagonal matrix [T]B
in a basis B = {b1, b2, ..., bn}:
[ a1 | 0 | 0 | 0 | ... | 0 ] |
[ 0 | a2 | 0 | 0 | ... | 0 ] |
[ 0 | 0 | a3 | 0 | ... | 0 ] |
...................... | |||||
[ 0 | 0 | 0 | 0 | ... | an ] |
We see that
if an operator has diagonal matrix in some basis,
this basis must consist of eigenvectors of this operator; the
numbers on the diagonal of this diagonal matrix are the corresponding
eigenvalues.
Examples. Let V=R2. We have considered
some types of linear operators in R2. Let us check which of these operators
have bases of eigenvectors.
1. Dilations. a dilation operator takes every vector V to kv for some
fixed number k. Thus every vector is an eigenvector of the dilation operator.
So every basis is a basis of eigenvectors. Notice that the matrix of the dilation operator in any basis is
[ k | 0 ] |
[ 0 | k ] |
[ 1 | 0 ] |
[ 0 | 0 ] |
3. Reflection about line L. This is similar to the previous case.
Any vector b1 which is parallel to L is an eigenvector with eigenvalue 1
and any vector b2 which is perpendicular to L is an eigenvector with
eigenvalue -1. Thus b1 and b2 form a basis of eigenvectors of the projection
and the matrix of the projectioon in this basis is:
[ 1 | 0 ] |
[ 0 | -1 ] |
4. Rotation through some angle (not equal to 0 and 180 degrees).
It is easy to see that the rotation does not have eigenvectors: there are
no vectors which are dilated by the rotation. Thus there are no bases of R2
in which the matrix of the rotation is diagonal.
Let T be a linear operator in an n-dimensional vector space V and let B
be any basis in V. Suppose that v is an eigenvector with the eigenvalue a.
Then T(v)=av. This means that the coordinate vectors of T(v) and v in the basis B
satisfy a similar condition:
Let us denote [T]B by A and [v]B by w. Then we have:
Recall that here w is an n-vector. A non-zero
vector w is
called an eigenvector of a matrix A with the eigenvalue a if
A*w=aw. Thus the problem of finding eigenvectors and eigenvalues
of linear operators in
n-dimensional spaces is reduced to the problem of finding eigenvectors and eigenvalues of their matrices.
We can rewrite the equality A*w=aw in the following form:
where I is the identity matrix. Moving the right hand side to the left,
we get:
This is a homogeneous system of linear equations. If the matrix A has
an eigenvector W then this system must have a non-trivial solution (w is not
0 !). By the second
theorem
about invertible matrices this is equivalent to the condition
that the matrix A-aI is not invertible.
Now by the third
theorem this is equivalent to the condition that det(A-aI) is not zero.
Thus
a matrix A has an eigenvector with an eigenvalue a if and only if
det(A-aI)=0.
Notice that det(A-aI) is a polynomial with the unknown a. This polynomial
is called the characteristic polynomial of the matrix a.
Thus
in order to find all eigenvectors of a matrix A, one needs to find the
characteristic polynomial of A, find all its roots (=eigenvalues)
and for each of these
roots a, find the solutions of the homogeneous system (A-aI)*w=0.
Example. The characteristic polynomial det(A-aI) is
(1-a)2-4. The roots are a=-1 and a=3. These are the eigenvalues of the operator T. Let a=-1. Then the system of equations (A-aI)*w=0 has the
form
where w=(x,y). The general solution is:
This solution gives the set of all eigenvectors with the eigenvalue -1.
It is a subspace spanned by the vector b1=(-1,1). Now let a=3. Then the system (A-aI)*w=0 has the form
The general solution is:
Thus the set of eigenvectors with eigenvalue 3 forms a subspace
spanned by the vector (1,1).
Notice that the vectors (-1,1) and (1,1) form a basis in R2
because they are non-proportional and the dimension of R2 is 2.
The matrix of the operator T
in this basis is diagonal:
Let T be the linear operator in R2 with the standard
matrix
[ 1
2 ]
[ 2
1 ]
2x+2y=0
y=t
2x-2y=0
x = t
y = t
[ -1
0 ]
[ 0
3 ]