Theorem. Let V^{c} be the orthogonal complement of a subspace V in a Euclidean vector space W. Then the following properties hold.
We shall return to this theorem later.
The following result shows an important connection between orthogonal complements and systems of linear equations.
Theorem. Let V be a subspace in R^{n} spanned by vectors s_{1}=(a_{11},...,a_{1n}), s_{2}=(a_{21},...,a_{2n}),..., s_{k}=(a_{k1},...,a_{kn}). Then the following conditions for a vector v'=(x_{1},...,x_{n}) are equivalent:
Thus V^{c} consists of all solutions of this system of equations.
The proof is left as
an exercise.
Given an orthogonal basis {v_{1},...,v_{n}}, one can get an orthonormal basis
by dividing each v_{i} by its length: {v_{1}/||v_{1}||,...,v_{n}/||v_{n}||}.
Suppose that we have a (not necessarily orthogonal) basis {s_{1},...,s_{n}}
of a Euclidean
vector space V. The next procedure, called the Gram-Schmidt algorithm,
produces an orthogonal basis {v_{1},...,v_{n}} of V. Let
We shall find v_{2} as a linear combination of v_{1} and s_{2}:
v_{2}=s_{2}+xv_{1}. Since v_{2} must be orthogonal to v_{1}, we have:
so
Next we can find v_{3} as a linear combination of s_{3}, v_{1} and v_{2}. A similar
calculation gives that
Continuing in this manner, we can get the formula for v_{i+1}:
By construction, the set of vectors {v_{1},...,v_{n}} is orthogonal. None of these
vectors is 0. Indeed, if v_{i} were equal to
0 then s_{i}, v_{1},...,v_{i-1} would be linearly
dependent, which would imply that s_{1},...,s_{i} are linearly dependent (replace
each v_{j} as a linear combination of s_{1},...,s_{j}), which is impossible since
{s_{1},...,s_{n}} is a basis.
This implies
that {v_{1},...,v_{n}} are linearly independent. Indeed, the following general
theorem holds.
Theorem. Every set of pairwise orthogonal non-zero vectors is linearly independent.
Proof. By contradiction
let {v_{1},...,v_{n}}
be a linearly dependent set of pairwise orthogonal non-zero vectors.
Then by the theorem about linearly independent sets:
for some numbers x_{i} not all of which are zeroes. Suppose that x_{i} is not 0. Take the dot product of the last equality with v_{i}:
Since v_{i} is orthogonal to every vector v_{j} except v_{i} itself,
each dot product
<v_{j},v_{i}> is 0 except <v_{i},v_{i}>. So
Since <v_{i},v_{i}> is not 0 (because v_{i} is not zero), we conclude that x_{i}=0.
This is a contradiction since we assumed that x_{i} is not 0.
Thus {v_{1},...,v_{n}} is a linearly independent set of vectors in the Euclidean vector space V. Since dim(V)=n, by the theorem
about dimension, this set is a basis in V. Therefore the Gram-Schmidt procedure
gives an orthogonal basis of V.
If a basis {v_{1},...,v_{n}} is orthogonal, then it is very easy to find
the coordinates of a vector a in this basis. Indeed, suppose that
a=x_{1}v_{1}+...+x_{n}v_{n}. If we multiply this equality by v_{i}, we get:
(all other terms are 0 because <v_{i},v_{j}>=0 if I is not equal to j). Thus
This is the formula for the i-th coordinate of a (i=1,2...,n).
Notice that if the basis is orthonormal, the formula is even easier:
because in this case v_{i}v_{i}=||v_{i}||^{2}=1.
Now we can return to the theorem about orthogonal complements. Here is its
formulation:
Theorem. Let V^{c} be the orthogonal complement of a subspace V in a Euclidean vector space W. Then the following properties hold.
Proof.
We leave 1 and 2
as exercises.
3,4. Take any basis {s_{1},...,s_{k}} of V. By the
theorem about dimension, one can add several vectors s_{k+1},..., s_{n}
and get a basis of W. Let us apply the Gram-Schmidt method to the basis
{s_{1},...,s_{n}} and get an orthogonal basis {v_{1},...,v_{n}} of W.
Then by construction, first k vectors v_{1},...,v_{k} belong to V (they are linear
combinations of s_{1},...,s_{k}). By the theorem
about dimension, v_{1},...,v_{k} form a basis of V. Let us show that v_{k+1},...,v_{n} form a basis of V^{c}. Indeed, each of these vectors is orthogonal to each of
v_{1},...,v_{k}. Therefore each of v_{k+1},...,v_{n} is orthogonal to every linear
combination of v_{1},...,v_{k}, So v_{k+1},...,v_{n} are orthogonal to V. Thus
these vectors belong to V^{c}. They are linearly independent by the theorem saying that every orthogonal set of non-zero vectors
is linearly independent. Let w be any vector in V^{c}. Then
Multiplying this equality by v_{1},...,v_{k}, we get that x_{1},...,x_{k} are 0.
So every vector in V^{c} is a linear combination of v_{k+1},...,v_{n}. Thus
v_{k+1},...,v_{n} is a basis of V^{c}. This immediately gives us the property
4.
Take any vector w in W. Then w=x_{1}v_{1}+...+x_{n}v_{n}=(x_{1}v_{1}+...+x_{k}v_{k})+(x_{k+1}v_{k+1}+...+x_{n}v_{n}). So every vector in W is a sum of a vector in V and a vector in V^{c}.
Let us prove that this representation is unique. Suppose (by contradiction)
that
where u and u are from V and u' and v' are from V^{c} and either
u is not equal to u' or v is not equal to v'. Then we can deduce
that
Notice that since V and V^{c} are subspaces, v-u belongs to V and u'-v' belongs to V^{c}. Let us multiply both sides of this equality by v-u. We get:
The right hand side of this equality is 0 because u'-v' is orthogonal to
v-u (and to any other vector from V). So (v-u)(v-u)=0. By a property of dot
products, we conclude that v-u=0, u=v. Similarly we get v'=u'. This contradiction completes the proof of part 3.
5. It is clear that (V^{c})^{c} contains V. On the other hand by part 4,
dim((V^{c})^{c})=dim(V)=dim(W)-dim(V^{c}). Therefore V=(V^{c})^{c}.
The theorem about orthogonal complements tells us
that if V is a subspace of a Euclidean vector space W and w is a vector from W
then w=v+v' for some v in V and v' in the orthogonal complement V^{c} of V. We
also know that this representation of w is unique.
The vector v is then called the projection of w onto V;
the vector v' is called the normal component
of w.
The Gram-Schmidt procedure gives us the formula for the projection and
the normal component. If v_{1},...,v_{k} is an orthogonal basis of the subspace V
then
<w,v_{1}> <w,v_{2}> <w,v_{k}> V = ---------v_{1} + --------- v_{2}+ ... + ----------v_{k} <v_{1},v_{1}> <v_{2},v_{2}> <v_{k},v_{k}>
Indeed, v belongs to V because each v_{i} is in V and V is closed under
linear combinations. And it is easy to check (exercise) that v'=w-v is orthogonal to each of v_{i}. This implies
that v' is orthogonal to every vector in V because vectors in V are linear
combinations of v_{1},...,v_{k} (
check that if a vector p is orthogonal to vectors q_{1},...,q_{n}, then P is
orthogonal to any linear combination of q_{1},...,q_{n}).
The theorem about orthogonal complements allows
us to find distances from vectors to subspaces in any Euclidean vector space.
Theorem. Let V be a subspace in a Euclidean vector space W and let w be
a vector from W. Let w=v+v' where v is the projection of w onto V and v' is
the normal component (as in the theorem about orthdogonal complements). Then ||v'|| is the distance from w to V and v is the closest
to w vector in V.
Suppose that a system of linear equations Av=b with the M by n
matrix of coefficients A does not have a solution.
In this case we can
look for a vector V for
which Av is closest possible to b. Such a vector V is called a least squares solution of the system Av=b. The term is motivated by the fact that
if V is a least squares solution of Av=b,
then the sum of squares of coordinates (the square of the norm)
of Av-b must be minimal.
The set V of vectors of the form Av is the range
of the linear transformation with the standard matrix A, so it is the
column space of the matrix A.
In fact if V=(x_{1},...,x_{n}) then
Where c_{1},..., c_{n} are column vectors of the matrix A.
Thus we need to find
the vector p in V such that the distance from b to p is the smallest.
>From the theorem about distances from a vector
and a subspace we know that p is the projection of b onto
V. Thus in order
to find v we need to execute the following procedure.
There exists an alternative procedure of finding a least squares
solution. Notice that V is a least squares solution of the system Av=b
if and only if Av-b is orthogonal to V (the column space of the matrix A).
By the theorem about orthogonal complements in
R^{n} a vector z is orthogonal to V if and only if z is a solution of the
system A^{T} z=0 where A^{T} is the transpose of A.
Thus v is a least squares solution of Av=b if and only if
v is the
solution of the system A^{T}(Av-b)=0 or equivalently
Thus in order to find a least squares solution of the system Av=b
one needs to solve another system of linear equations (with the
matrix of coefficients A^{T}A).
There exists yet another procedure of finding a least squares
solution of a system of linear equations is Av=b.
We need to minimize the distance
dist(b, Av). This distance is a function in n variables x_{1},...,x_{n}. Thus our
problem is just the problem of finding a minimum of a function in n variables.
In fact since the formula for dist(b, Av) contains a square root, it is more
convenient to minimize the function f(v)=dist(b, Av)^{2} which is just a quadratic
polynomial. It can be done by finding the gradient of this function
and solving the system of equations grad(f)=0.
We know that vectors in different bases may have different coordinates,
thus when we deal with vector spaces we often need to change a basis to
get better coordinates for the vectors we are working with.
Let B={v_{1},...,v_{n}} and B'={s_{1},...,s_{n}} be bases of a vector space V.
Then every vector from B is a linear combination of vectors from B':
Consider the following matrix A:
[ a_{11} | a_{21} | ... | a_{n1} ] |
[ a_{12} | a_{22} | ... | a_{n2} ] |
..................... | |||
[ a_{1n} | a_{2n} | ... | a_{nn} ] |
This matrix is called the
transition (transformation) matrix from B to B'.
Take any vector w in V. Suppose that w has coordinates (x_{1},...,x_{n}) in
the basis B. This means that w=x_{1}v_{1}+...x_{n}v_{n}. By substituting
the expressions of v_{i} into this equality, we can get the expression of
w as a linear combination of vectors from B'. The coefficients in
this linear combination are the coordinates of w in the basis B'. It is easy to
compute that the vector of coordinates that we get will be equal to Av
where v=(x_{1},...,x_{n}). Thus in
order to get the coordinate vector v' of w
in the
basis B', one needs to multiply the transition matrix A
from B to B' by
the coordinate vector v of w in the basis B.:
Let B={b_{1},...,b_{n}} be a basis in an n-dimensional vector space V.
For every vector v in V let [v]_{B} denote the column vector of coordinates
of v in the basis B.
Recall
that we constructed the standard
matrix of a linear operator T in R^{n} in a similar manner: we took the standard
basis in R^{n}, v_{1}=(1,0,...,0), ..., v_{n}=(0,0,...,1), and then the columns of the standard
matrix of T were the images T(v_{1}),...,T(v_{n}) (which are equal to the coordinate
vectors of T(v_{1}),...,T(v_{n}) in the standard basis).
The theorem
about characterization of linear transformations from R^{m} to R^{n} tells us
that the image of every vector v from R^{m} under the transformation T is equal to
the product of the standard matrix of T and the (column) vector v. Almost
exactly the same argument proves the following statement.
Theorem. If [T]_{B} is the matrix of a linear operator T in a vector space V
then for every vector v in V we have:
In other words: the coordinate vector of the image of v
is equal to the
product of the matrix [T]_{B} and the coordinate vector
of v.
The proof is left as an exercise.
Example. Let T be an operator in R^{2} with the standard matrix
[ 1 | 2 ] |
[ 2 | 1 ] |
Let B be the basis consisting of two vectors b_{1}=(1,1) and b_{2}=(1,2)
( prove that it is a basis).
Let us construct the matrix [T]_{B} of the operator T in the basis B.
By definition, the columns of the matrix [T]_{B} are the coordinate vectors of
T(b_{1}) and T(b_{2}) in the basis B. It is easy to compute that
T(b_{1})=(3,3)=3b_{1}+0b_{2}. So the coordinate vector of T(b_{1}) in the basis B
is (3,0).
Now T(b_{2})=(5,4)=6b_{1}-b_{2}. So the coordinate vector of T(b_{2}) in the basis B
is (6, -1). Thus the matrix of the operator T in the basis B is
[ 3 | 6 ] |
[ 0 | -1 ] |
The previous example shows that a linear operator can have different
matrices in different bases. The goal (very important in many applications of linear algebra
to mathematics and physics) is to find a basis in which the matrix of
a given linear operator is as simple as possible.
First of all let us find out what happens to the matrix of an operator
when we change the basis. Let T be a linear operator in a vector space V
and let B_{1} and B_{2} be two bases in V. Then we have the transition matrix
M(B_{1},B_{2}) from B_{1} to B_{2} and the inverse transition matrix M(B_{2},B_{1})
from B_{2} to B_{1}. We know that for every vector V in V we have:
We also know that
Using these formulas we get:
Multiplying by the inverse of M(B_{2},B_{1}) (that is M(B_{1},B_{2})), we get:
This means that [T]_{B2}, the matrix of the operator T in the basis B_{2},
is equal to M(B_{2},B_{1})^{-1} * [T]_{B1} * M(B_{2},B_{1}). This is the connection we were
looking for.
Two matrices C and D are called similar if
there exists an invertible matrix M such that
Thus we can say that
matrices of the same operator in different bases
are similar.
The simplest possible matrices are diagonal. When is it possible to find a basis in which the matrix
of a given operator is diagonal?
Suppose that an operator T in a vector space V has a diagonal matrix [T]_{B}
in a basis B = {b_{1}, b_{2}, ..., b_{n}}:
[ a_{1} | 0 | 0 | 0 | ... | 0 ] |
[ 0 | a_{2} | 0 | 0 | ... | 0 ] |
[ 0 | 0 | a_{3} | 0 | ... | 0 ] |
...................... | |||||
[ 0 | 0 | 0 | 0 | ... | a_{n} ] |
We see that
if an operator has diagonal matrix in some basis,
this basis must consist of eigenvectors of this operator; the
numbers on the diagonal of this diagonal matrix are the corresponding
eigenvalues.
Examples. Let V=R^{2}. We have considered
some types of linear operators in R^{2}. Let us check which of these operators
have bases of eigenvectors.
1. Dilations. a dilation operator takes every vector V to kv for some
fixed number k. Thus every vector is an eigenvector of the dilation operator.
So every basis is a basis of eigenvectors. Notice that the matrix of the dilation operator in any basis is
[ k | 0 ] |
[ 0 | k ] |
[ 1 | 0 ] |
[ 0 | 0 ] |
3. Reflection about line L. This is similar to the previous case.
Any vector b_{1} which is parallel to L is an eigenvector with eigenvalue 1
and any vector b_{2} which is perpendicular to L is an eigenvector with
eigenvalue -1. Thus b_{1} and b_{2} form a basis of eigenvectors of the projection
and the matrix of the projectioon in this basis is:
[ 1 | 0 ] |
[ 0 | -1 ] |
4. Rotation through some angle (not equal to 0 and 180 degrees).
It is easy to see that the rotation does not have eigenvectors: there are
no vectors which are dilated by the rotation. Thus there are no bases of R^{2}
in which the matrix of the rotation is diagonal.
Let T be a linear operator in an n-dimensional vector space V and let B
be any basis in V. Suppose that v is an eigenvector with the eigenvalue a.
Then T(v)=av. This means that the coordinate vectors of T(v) and v in the basis B
satisfy a similar condition:
Let us denote [T]_{B} by A and [v]_{B} by w. Then we have:
Recall that here w is an n-vector. A non-zero
vector w is
called an eigenvector of a matrix A with the eigenvalue a if
A*w=aw. Thus the problem of finding eigenvectors and eigenvalues
of linear operators in
n-dimensional spaces is reduced to the problem of finding eigenvectors and eigenvalues of their matrices.
We can rewrite the equality A*w=aw in the following form:
where I is the identity matrix. Moving the right hand side to the left,
we get:
This is a homogeneous system of linear equations. If the matrix A has
an eigenvector W then this system must have a non-trivial solution (w is not
0 !). By the second
theorem
about invertible matrices this is equivalent to the condition
that the matrix A-aI is not invertible.
Now by the third
theorem this is equivalent to the condition that det(A-aI) is not zero.
Thus
a matrix A has an eigenvector with an eigenvalue a if and only if
det(A-aI)=0.
Example.
Let T be the linear operator in R^{2} with the standard
matrix
[ 1 | 2 ] |
[ 2 | 1 ] |
The characteristic polynomial det(A-aI) is
(1-a)^{2}-4. The roots are a=-1 and a=3. These are the eigenvalues of the operator T.
Let a=-1. Then the system of equations (A-aI)*w=0 has the
form
where w=(x,y). The general solution is:
This solution gives the set of all eigenvectors with the eigenvalue -1.
It is a subspace spanned by the vector b_{1}=(-1,1).
Now let a=3. Then the system (A-aI)*w=0 has the form
The general solution is:
x = t
y = t
Thus the set of eigenvectors with eigenvalue 3 forms a subspace
spanned by the vector (1,1).
Notice that the vectors (-1,1) and (1,1) form a basis in R^{2}
because they are non-proportional and the dimension of R^{2} is 2.
The matrix of the operator T
in this basis is diagonal:
[ -1 | 0 ] |
[ 0 | 3 ] |