Dr. Mark V. Sapir

Linear Transformations From Rⁿ to R^m

There is a yet another way to look at systems of linear equations. Suppose that we want to find all solutions of the following system of linear equations

Av=b

where A is an m by n matrix of coefficients and b is the column of right sides. For every n-vector v we can get an m-vector Av. Our goal is to find all n-vectors v such that this m-vector is b. Thus we have a function which takes any vector v from Rⁿ to the vector Av from R^m and our goal is to find all values of the argument of this function for which the function has a particular value.

A function from Rⁿ to R^m which takes every n-vector v to the m-vector Av where A is a m by n matrix, is called a linear transformation. The matrix A is called the standard matrix of this transformation. If n=m then the transformation is called a linear operator of the vector space Rⁿ.

Notice that by the definition the linear transformation with a standard matrix A takes every vector

(x₁,...,x_n)

from Rⁿ to the vector

(A(1,1)x₁+...+A(1,n)x_n, A(2,1)x₁+...+A(2,n)x_n,...,A(m,1)x₁+...+A(m,n)x_n)

from R^m where A(i,j) are the entries of A. Conversely, every transformation from Rⁿ to R^m given by a formula of this kind is a linear transformation and the coefficients A(i,j) form the standard matrix of this transformation.

Examples. 1. Consider the transformation of R² which takes each vector (a,b) to the opposite vector (-a,-b). This is a linear operator with standard matrix

[ -1	0 ]
[ 0	-1 ]

2. More generally, the dilation operator is the linear operator from Rⁿ to Rⁿ which takes every vector

(x₁,...,x_n)

(kx₁,...,kx_n)

where k is a constant.

3. If we take a vector (x,y) in R² and reflect it about the x-axis, we get vector (x,-y). Clearly, this reflection is a linear operator. Its standard matrix is

[ 1	0 ]
[ 0	-1 ]

4. If we project a vector (x,y) on the x-axis, we get vector (x,0). This projection is also a linear operator. Its standard matrix is

[ 1	0 ]
[ 0	0 ]

5. If we rotate a vector (x,y) through 90 degrees counterclockwise, we get vector (-y, x). This rotation is a linear operator with standard matrix

[ 0	-1 ]
[ 1	0 ]

A characterization of linear transformations

We shall prove that reflections about arbitrary lines, projections on arbitrary axes, and rotations through arbitrary angles in R² are linear operators. In order to do this we need the following simple characterization of linear transformations from Rⁿ to Rⁿ.

Theorem. A function T from Rⁿ to R^m is a linear transformation if and only if it satisfies the following two properties:

For every two vectors A and B in Rⁿ
T(A+B)=T(A)+T(B);
For every vector A in Rⁿ and every number k
T(kA)=kT(A).

This proof shows that if T is a linear transformation, V_i (I=1,...,n) is the vector with I-th coordinate 1 and other coordinates 0, then T(V_i) is the I-th column in the standard matrix of T. This provides us with a way to find the standard matrix of a linear transformation.
Notice that in R³, vectors V₁, V₂, V₃ are the basic vectors i, j, k. So we shall call V_i the basic vectors in Rⁿ. We shall give a general definition of bases in Rⁿ and other vector spaces later.

As a corollary of the characterization of linear transformations from R^m to Rⁿ we can deduce the following statement.

Corollary. Every linear transformation T from R^m to Rⁿ takes 0 of R^m to 0 of Rⁿ.

Indeed, take k=0 and an arbitrary vector A then

T(0)=T(0*A)=0T(A)=0.

Here we used the second condition of the characterization.

Linear operators in R²

Example 1. Projection on an arbitrary line in R². Let L be an arbitrary line in R². Let T_L be the transformation of R² which takes every 2-vector to its projection on L. It is clear that the projection of the sum of two vectors is the sum of the projections of these vectors. If we multiply a vector by a scalar then its projection will also be multiplied by this scalar. Thus by the characterization of linear transformations, T_L is a linear operator on R².

Let us find the standard matrix of the projection on the line y=kx. This line has the direction of the vector A=(1,k). Let V be an arbitrary vector (x,y) in R². Then the projection P is such a vector that

P is parallel to A, that is P=tA=(t, kt);
V-P is perpendicular to A. This means that <(V-P),A>=0 or

<(x-t, y-kt)*(1,k)>=x-t+k(y-kt)=0

From this, we can deduce that

t=(x+ky)/(1+k²)

P=((x+ky)/(1+k²), k(x+ky)/(1+k²))

Therefore the standard matrix of the projection is

1/(k²+1)	[ 1	k ]
1/(k²+1)	[ k	k² ]

Notice that the formula for vector P gives another proof that the projection is a linear operator (compare with the general form of linear operators).

Example 2. Reflection about an arbitrary line.

If P is the projection of vector v on the line L then V-P is perpendicular to L and Q=V-2(V-P) is equal to the reflection of V about the line L. Thus Q=2P-V. Using the formula for P that we have, we can deduce a formula for Q:

P=((x+ky)/(1+k²), k(x+ky)/(1+k²),

Q=2P-V=(2(x+ky)/(1+k²)-x, 2k(x+ky)/(1+k²)-y).

This gives us the standard matrix of the reflection:

1/(k²+1)	[ 1-k²	2k ]
1/(k²+1)	[ 2k	k²-1 ]

Example 3. Rotation through angle a

Using the characterization of linear transformations it is easy to show that the rotation of vectors in R² through any angle a (counterclockwise) is a linear operator. In order to find its standard matrix, we shall use the observation made immediately after the proof of the characterization of linear transformations. This observation says that the columns of the standard matrix are images of the basic vectors (1,0) and (0,1). It is clear that these images are (cos(a), sin(a)) and (-sin(a), cos(a)). Therefore the standard matrix of the rotation is:

[ cos(a)	-sin(a) ]
[ sin(a)	cos(a) ]

Notice that the rotation clockwise by angle a has the following matrix:

[ cos(a)	sin (a) ]
[ -sin(a)	cos(a) ]

because it is equal to the rotation counterclockwise through the angle -a.

Operations on linear transformations

Suppose that T is a linear transformation from R^m to Rⁿ with standard matrix A and S is a linear transformation from Rⁿ to R^k with standard matrix B. Then we can compose or multiply these two transformations and create a new transformation ST which takes vectors from R^m to R^k. This transformation first applies T and then S. Not any two transformations can be multiplied: the transformation S must start where T ends. But any two linear operators in Rⁿ (that is linear transformations from Rⁿ to Rⁿ) can be multiplied.

Notice that if v is a vector in Rⁿ then

T(V)=AV

by the definition of the standard matrix of a linear transformation. Then

ST(V)=S(T(V))=B(AV)=(BA)V

Thus the product ST is a linear transformation and the standard matrix ST is the product of standard matrices BA.

Example 1. Suppose that T and S are rotations in R², T rotates through angle a and S rotates through angle b (all rotations are counterclockwise). Then ST is of course the rotation through angle a+b. The standard matrix of T is

[ cos(a)	-sin(a) ]
[ sin(a)	cos(a) ]

The standard matrix of S is

[ cos(b)	-sin(b) ]
[ sin(b)	cos(b) ]

Thus the standard matrix of ST must be the product of these matrices:

[ cos(a)cos(b)-sin(a)sin(b)	-cos(a)sin(b)-sin(a)cos(b) ]
[ cos(a)sin(b)+sin(a)cos(b)	cos(a)cos(b)-sin(a)sin(b) ]

On the other hand this is the standard matrix of the rotation through angle a+b, so its standard matrix must be equal to

[ cos(a+b)	-sin(a+b) ]
[ sin(a+b)	cos(a+b) ]

This gives us the well known trigonometric formulas:

cos(a+b)=cos(a)cos(b)-sin(a)sin(b)
sin(a+b)=sin(a)cos(b)+cos(a)sin(b)

Example 2. Let L be a line which forms angle a with the x-axis. Then the reflection about L can be represented as a product of three operators:

The rotation through a clockwise.
The reflection about the x-axis.
The rotation through a counterclockwise.

Thus we could find the standard matrix of the reflection about the line L by multiplying the standard matrices of these three transformations.

Similarly, the projection on L can be decomposed into a product of three operators:

The rotation through a clockwise.
The projection on the x-axis.
The rotation through a counterclockwise.

If T and S are linear transformations from R^m to Rⁿ then we can add them, that is create a function T+S also from R^m to Rⁿ which takes every vector V to T(V)+S(V).

If A and B are the standard matrices of T and S respectively, then

(T+S)(V)=T(V)+S(V)=AV+BV=(A+B)*V

Thus the sum of linear transformations from R^m to Rⁿ is again a linear transformation and the standard matrix of the sum of linear transformations is the sum of standard matrices of these transformations.

We can also multiply a linear transformation by a scalar. If k is a number and T is a linear transformation from R^m to Rⁿ then kT is a function from R^m to Rⁿ which takes every vector V from R^m to kT(V). It is easy to see that the standard matrix of kT is kA.

Summarizing the properties of linear transformations from R^m to Rⁿ that we have obtained so far, we can formulate the following theorem.

Theorem. 1. The product ST of a linear transformation T from R^m to Rⁿ and a linear transformation S from Rⁿ to R^k is a linear transformation from R^m to R^k and the standard matrix of ST is equal to the product of standard matrices of S and T.

2. If T and S are linear transformations from R^m to Rⁿ then T+S is again a linear transformation from R^m to Rⁿ and the standard matrix of this transformation is equal to the sum of standard matrices of T and S.

3. If T is a linear transformation from R^m to Rⁿ and k is a scalar then kT is again a linear transformation from R^m to Rⁿ and the standard matrix of this transformation is equal to k times the standard matrix of T.

By definition, the identity function from Rⁿ to Rⁿ is the function which takes every vector to itself. It is clear that the identity function is a linear operator whose standard matrix is the identity matrix. Let us denote the identity operator by Id.

A linear operator T in Rⁿ is called invertible if there exists another linear operator S in Rⁿ such that TS=ST=Id. In this case S is called the inverse of T. By definition S undoes what T does, that is if T takes V to W then S must take W to V (otherwise ST would not be the identity operator). If A is the standard matrix of T and B is the standard matrix of S then ST has standard matrix BA. So if S is the inverse of T then BA=I. Conversely, if BA=I then the linear operator S with standard matrix B is the inverse of T because ST is the linear operator whose standard matrix is I. Thus we can conclude that the following statement is true.

Theorem. A linear operator T in Rⁿ is invertible if and only if its standard matrix is invertible. If A is the standard matrix of T then A^-1 is the standard matrix of T^-1.

Example 1. The reflection about a line in R² is invertible and the inverse of a reflection is the reflection itself (indeed, if we apply the reflection to a vector twice, we do not change the vector).

Example 2. The rotation through angle a is invertible and the inverse is the rotation through angle -a.

Example 3. The projection on a line in R² is not invertible because there are many vectors taken by the projection to the same vector, so we cannot uniquely reconstruct a vector by its image under the projection.

Our next goal is to consider properties of invertible linear operators.

First let us recall some properties of invertible maps (functions). Let T be a map from set X into set Y. We say that T is injective or one to one if T maps different elements to different elements, that is if T(u)=T(v) then necessarily u=v. We call T surjective or onto if every element in Y is an image of some element in X that is for every y in Y there exists an x in X such that T(x)=y.

A function T from X to X is called invertible if there exists another function S from X to X such that TS=ST=Id, the identity function (that is if T takes x to y then S must take y to x). It is easy to see that T is invertible if and only if it is injective and surjective.

There exist functions which are non-injective and non-surjective (the function T(x)=x² from R to R), non-injective and surjective (say, T(x)=x³-x from R to R), injective and non-surjective (say, T(x)=arctan(x) from R to R), injective and surjective (any invertible function, say T(x)=x³ from R to R).

Thus the following theorem about linear operators is very surprising.

Theorem. For every linear operator T in Rⁿ with standard matrix A the following conditions are equivalent:

T is invertible.
A is invertible.
T is injective.
T is surjective.

Linear transformations of arbitrary vector spaces

Let V and W be arbitrary vector spaces. A map T from V to W is called a linear transformation if

For every two vectors A and B in V
T(A+B)=T(A)+T(B);
For every vector A in V and every number k
T(kA)=kT(A).

In the particular case when V=W, T is called a linear operator in V.

We have seen (see the characterization of linear transformations from R^m to Rⁿ) that linear transformations from R^m to Rⁿ are precisely the maps which satisfy these conditions. Therefore in the case of vector spaces of n-vectors this definition is equivalent to the original definition. Other vector spaces give us more examples of natural linear transformations.

Positive examples.1. Let V be the set of all polynomials in one variable. We shall see later that V is a vector space with the natural addition and scalar multiplication (it is not difficult to show it directly). The map which takes each polynomial to its derivative is a linear operator in V as easily follows from the properties of derivative:

(p(x)+q(x))' = p'(x) +q'(x),
(kp(x))'=kp'(x).

2. Let C[0,1] be the vector space of all continuous functions on the interval [0,1]. Then the map which takes every function S(x) from C[0,1] to the function h(x) which is equal to the integral from 0 to x of S(t) is a linear operator in C[0,1] as follows from the properties of integrals.

int(T(t)+S(t)) dt = int T(t)dt + int S(t)dt
int kS(t) dt = k int S(t) dt.

3. The map from C[0,1] to R which takes every function S(x) to the number S(1/3) is a linear transformation (1/3 can be replaced by any number between 0 and 1):

(T+S)(1/3)=T(1/3)+S(1/3),
(kS)(1/3)=k(S(1/3)).

4. The map from the vector space of all complex numbers C to itself which takes every complex number a+bi to its imaginary part bi is a linear operator (check!).

5. The map from the vector space of all n by n matrices (n is fixed) to R which takes every matrix A to its (1,1)-entry A(1,1) is a linear transformation (check!).

6. The map from the vector space of all n by n matrices to R which takes every matrix A to its trace trace(A) is a linear transformation (check!).

7. The map from an arbitrary vector space V to an arbitrary vector space W which takes every vector v from V to 0 is a linear transformation (check!). This transformation is called the null transformation

8. The map from an arbitrary vector space V to V which takes every vector to itself (the identity map) is a linear operator (check!). It is called the identity operator, denoted I.

Negative examples. 1. The map T from which takes every function S(x) from C[0,1] to the function S(x)+1 is not a linear transformation because if we take k=0, S(x)=x then the image of kT(x) (=0) is the constant function 1 and k times the image of T(x) is the constant function 0. So the second property of linear transformations does not hold.

2. The map T from the vector space of complex numbers C to R which takes every complex number a+bi to its norm sqrt(a²+b²) is not a linear transformation because if we take A=3 and B=4i then T(A+B)=||3+4i||=5 and T(A)+T(B)=3+4=7, so T(A+B) is not equal to T(A)+T(B), so the first property of linear transformations does not hold.

The following theorem contains some important properties of linear transformations (compare with the corollary from the characterization T linear transformations from R^m to Rⁿ and the theorem about products, sums and scalar multiples of linear transformations).

Theorem. 1. If T is a linear transformation from V to W then T(0)=0.

2. If T is a linear transformation from V to W and S is a linear transformation from W to Y (V, W, Y are vector spaces) then the product (composition) ST is a linear transformation from V to Y.

3. If T and S are linear transformations from V to W (V and W are vector spaces) then the sum T+S which takes every vector A in V to the sum T(A)+S(A) in W is again a linear transformation from V to W.

4. If T is a linear transformation from V to W and k is a scalar then the map kT which takes every vector A in V to k times T(A) is again a linear transformation from V to W.

The proof is left as an exercise.

Some properties of linear transformations, which hold for linear transformations from R^m to Rⁿ, do not hold for arbitrary vector spaces.

For example let P be the vector space of all polynomials. Let T be the linear operator which takes every polynomial to its derivative. Then T is surjective because every polynomial is a derivative of some other polynomial (anti-derivatives of a polynomial are polynomials). But T is not injective because the images of x² and x²+1 are the same (2x). Recall that for linear transformations from R^m to Rⁿ injectiveness and surjectiveness are equivalent.

Notice that since the operator T is not injective, it cannot have an inverse. But let S be the operator on the same space which takes every polynomial to its anti-derivative int(p(t), t=0..x). Then for every polynomial p we have: TS(p)=p (the derivative of the anti-derivative of a function is the function itself). Thus TS=I. On the other hand ST is not equal to I, for, say if p=x+1 then T(p)=1, ST(p)=x, so ST(p) is not equal to p.

For linear operators in R^m, this cannot happen. Indeed, if TS=I then the product of standard matrices of T and S is I. So the standard matrix A of T is invertible, and the standard matrix B of S is the inverse of A. Hence S is the inverse of T and ST=I.

The proof in the last paragraph does not have references to the results that we used. Find these references!