Dr. Mark V. Sapir

Elementary matrices.

An elementary matrix is a matrix obtained from an identity matrix by one of the row operations.

Example.

[ 1	3 ]
[ 0	1 ]

[ 1	0	0 ]
[ 1	1	0 ]
[ 0	0	1 ]

[ 1	0	0 ]
[ 0	0	1 ]
[ 0	1	0 ]

[ 1	0	0 ]
[ 0	2	0 ]
[ 0	0	1 ]

The first two matrices are obtained by adding a multiple of one row to another row. The third matrix is obtained by swapping two rows. The last matrix is obtained by multiplying a row by a number.

As we see, elementary matrices usually have lots of zeroes.

Elementary matrices which are obtained by adding rows contain only one non-diagonal non-zero entry.

Elementary matrices which are obtained by multiplying a row by a number contain exactly 1 non-unit entry on the diagonal and no non-zero entries outside the diagonal.

Elementary matrices which are obtained by swapping consist of 0s and 1s and contain exactly 2 non-diagonal entries.

The converse statements are true also (for example every matrix with 1s on the diagonal and exactly one non-zero entry outside the diagonal) is an elementary matrix.

The main result about elementary matrices is that every invertible matrix is a product of elementary matrices. These are in some sense the smallest particles in the world of invertible matrices. We shall prove it later.

Theorem. If the elementary matrix E is obtained by performing a row operation on the identity matrix I_m and if A is an m by n matrix, then the product EA is the matrix obtained from A by applying the same row operation.

Lemma. Every elementary matrix is invertible and the inverse is again an elementary matrix. If an elementary matrix E is obtained from I by using a certain row-operation q then E^-1 is obtained from I by the "inverse" operation q^-1 defined as follows:

If q is the adding operation (add x times row j to row i) then q^-1 is also an adding operation (add -x times row j to row i).

If q is the multiplication of a row by x then q^-1 is the multiplication of the same row by x^-1.

If q is a swapping operation then q^-1=q.

Now we are able to prove the second theorem about inverses.

Theorem. If A is an n by n square matrix, then the following statements are equivalent.

A is invertible.
The system Av=b has at least one solution for every column-vector b.
The system Av=b has exactly one solution for every column-vector b (here v is the column-vector of unknowns).
The system Av=0 has only the trivial solution (0,0,0,...0) (see Homogeneous systems).
The reduced row-echelon form of A is the identity matrix.
A is a product of elementary matrices.

The proof of this theorem gives an algorithm to represent a matrix as a product of elementary matrices: in order to represent an invertible square matrix A as a product of elementary matrices one needs to find a sequence of row operations p₁,..., p_m which reduce A to its reduced row echelon form which is the identity matrix; then A is the product of elementary matrices E₁^-1,...,E_m^-1 corresponding to the inverse row operations p₁^-1,...,p_m^-1:

                       A=E₁^-1  E₂^-1  ...E_m^-1                                  (1)

Example

If we take inverses of both sides of formula (1) then we get the following formula (recall that the inverse of a product is the product of inverses in the opposite order, and that the inverse of A^-1 is A):

A^-1 =E_m ...E₂ E₁

Notice that the matrices E₁,..., E_m are (by the lemma about elementary matrices) the elementary matrices corresponding to the row operations p₁,...,p_m. Since E₁I=E₁, we can rewrite the last equality in the following form:

A^-1 =E_m ...E₂ E₁ I.

Now the theorem about elementary matrices allows us to interpret this equality in the following way: in order to get the inverse A^-1 of an invertible matrix A, one can find a sequence of row operations that reduces A to I, and then perform this sequence of operations on I.

We can join these two steps by first augmenting A and I (denote the resulting matrix by [A |I] ) and then apply the row operations to the resulting matrix reducing the left part of it to I. The right part will be transformed into A^-1.

Example

Symmetric, diagonal, triangular matrices

A square matrix A is called symmetric if A^T=A, that is if A(i,j)=A(j,i) for every i and j. Thus A is symmetric if and only if it is symmetric with respect to the main diagonal. Here is an example of a symmetric matrix:

[ 1	2	3 ]
[ 2	4	5 ]
[ 3	5	6 ]

An important subclass of symmetric matrices is formed by diagonal matrices, i.e. matrices which have zeroes everywhere outside the main diagonal. For example, the identity matrix is a diagonal matrix.

We present here three theorems about symmetric matrices.

Theorem.

The sum of two symmetric matrices is a symmetric matrix.
If we multiply a symmetric matrix by a scalar, the result will be a symmetric matrix.
If A and B are symmetric matrices then AB+BA is a symmetric matrix (thus symmetric matrices form a so-called Jordan algebra).
Any power Aⁿ of a symmetric matrix A (n is any positive integer) is a symmetric matrix.
If A is an invertible symmetric matrix then A^-1 is also symmetric.

The product of symmetric matrices does not need to be symmetric.

Example.

Let

A =

[ 1	2 ]
[ 2	3 ]

B =

[ 1	3 ]
[ 3	1 ]

then

AB =

[ 7	7 ]
[ 11	9 ]

Both A and B are symmetric but AB is not symmetric.

In fact the following result holds.

Theorem.

If the product of two symmetric matrices A and B of the same size is symmetric then AB=BA.
Conversely, if A and B are symmetric matrices of the same size and AB=BA then AB is symmetric.

We leave the proof of this theorem as an exercise (it is similar to the proof of the first theorem about symmetric matrices)

The following theorem is much harder to prove:

Theorem. A square matrix is symmetric if and only if it is equal to a product A*A^T for some square matrix A with possibly complex entries.

I will prove only the easy part of this statement: for every matrix A the product A*A^T is symmetric.

Indeed, let B=A*A^T. Then B^T=(A*A^T)^T. By the theorem about transposes, the transpose of a product is the product of transposes in the opposite order. Thus B^T=(A^T)^T*A^T. By the same theorem (A^T)^T=A. Thus B^T=A*A^T=B, so by definition B is symmetric.

To see a hint for the proof of the other part of the theorem (if B is symmetric then B=A*A^T for some A) click here.

A square matrix A is called upper triangular (resp. lower triangular) if all its entries below (resp. above) the main diagonal are zeroes, that is A(i,j)=0 whenever i is greater than j (resp. A(i,j)=0 whenever i is less than j).

Example.

[ 1	2	3 ]
[ 0	0	1 ]
[ 0	0	1 ]

[ 1	0	0 ]
[ 0	1	0 ]
[ 2	3	4 ]

The first of these matrices is upper triangular, the second is lower triangular.

Theorem.

Every square matrix is a sum of an upper triangular matrix and a lower triangular matrix.
The product of two upper (lower) triangular matrices is an upper (lower) triangular matrix.
The transpose of an upper triangular matrix is a low triangular matrix.

A square matrix A is called skew-symmetric if A^T=-A, that is A(i,j)=-A(j,i) for every i and j. In particular, if i=j then A(i,i)=0, that is the diagonal entries of a skew-symmetric matrix are equal to 0.

Example:

[ 0	2	3 ]
[ -2	0	4 ]
[ -3	-4	0 ]

Theorem.

If A is invertible and skew-symmetric matrices then the inverse of A is skew-symmetric.
If A and B are skew-symmetric matrices then A^T, A+B, AB-BA, and kA are skew-symmetric for every scalar k.
Every square matrix is the sum of a symmetric and a skew-symmetric matrices.

I leave the proof of this theorem as an exercise.