2 Conjugate Gradient Algorithm

Next: 3 Multigrid Iteration Up: 2 Conjugate Gradient Iteration Previous: 1 Minimisation Problem & Contents

2 Conjugate Gradient Algorithm

We will call two vectors ${\rm\bf x}$ and ${\rm\bf y}$ A-conjugate if ${\rm\bf x}^{T}{\rm\bf A}{\rm\bf y}=0$ . We can also see that if we denote the characteristic polynomial of ${\rm\bf A}$ as

$\displaystyle p(\lambda )={\rm det}({\rm\bf A}-\lambda{\rm\bf I}),$

(179)

then Cayley-Hamilton's theorem tells us that $p({\rm\bf A})=0$ too. If we suppose $p(\lambda )=\sum_jp_j\lambda^j$ , then

$\displaystyle p_n{\rm\bf A}^n +p_{n-1}{\rm\bf A}^{n-1}+\cdots +p_0{\rm\bf I}=0,$

(180)

and so, as the matrix is non-singular ( $p_0\ne 0$ ),

$\displaystyle {\rm\bf A}^{-1}=-{1\over p_0}[p_1{\rm\bf I}+p_2{\rm\bf A}+\cdots +p_n{\rm\bf A}^{n-1}]$

(181)

and the solution ${\rm\bf x}$ must be

$\displaystyle {\rm\bf x}=\sum_{j=0}^{n-1}x_j{\rm\bf A}^j{\rm\bf b},$

(182)

where $x_j=-p_{j+1}/p_0$ . Thus the solution is able to be expressed as a linear combination of vectors of the form ${\rm\bf A}^j{\rm\bf b},\ j=0,\ldots ,n-1$ . Let $K_r={\rm span}\{{\rm\bf b},\ldots ,{\rm\bf A}^r{\rm\bf b}\}$ then one approximation strategy is to try to find approximations from the subspaces $K_0, K_1,\cdots$ which successively minimise $\phi$ . After

steps we must have the solution but of course we hope that when

is very large, we might obtain a good approximation after only a small number of steps. The obvious way to do this would be at step

, approximating from

to try to find coefficients

such that if ${\rm\bf x}_r=\sum_{j=0}^{j=r}q_j{\rm\bf A}^j{\rm\bf b}$ then $\phi({\rm\bf x}_r )\equiv \phi(q_0,q_1,\ldots ,q_r)$ is minimised. In this case

$\displaystyle \phi (q_0,\ldots ,q_r) = {1\over 2}\sum_{j=0}^{j=r}\sum_{k=0}^{k=... ... A}^{j+k+1}{\rm\bf b} -\sum_{j=0}^{j=r}q_j{\rm\bf b}^{T}{\rm\bf A}^j{\rm\bf b},$

(183)

and so setting the derivatives of $\phi$ with respect to $q_0,\cdots ,q_r$ to be zero would mean solving another matrix system of growing size at each iteration. The idea of conjugate gradient is to find coefficients only once in an expansion of the solution in terms of a growing set of basis vectors. To do this we need a different basis for

. Suppose we call this basis $\{{\rm\bf d}_0,\ldots ,{\rm\bf d}_r\}$ and that the basis vectors are

-conjugate. Now if we let ${\rm\bf x}_r=\sum_{j=0}^{j=r}q_j{\rm\bf d}_j$ then

$\displaystyle \phi (q_0,\ldots ,q_r)=\sum_{j=0}^{j=r}[{1\over 2}q_j^2{\rm\bf d}_j^{T}{\rm\bf A}{\rm\bf d}_j - q_j{\rm\bf d}_j^{T}{\rm\bf b}],$

(184)

so now the minimisation problem is almost trivial, each component

is independent of the others and at each step, the existing components do not change. So we only calculate the components in each direction once. Of course we do have to construct the basis $\{{\rm\bf d}_0,\ldots ,{\rm\bf d}_r\}$ but since it has to span

and at the $r^{th}$ step we already have an

-conjugate basis for $K_{r-1}$ , all we have to do is compute one additional basis vector at each step.

$\begin{theorem} If \hbox{${\rm\bf x}_{r-1}$} minimises \hbox{$\phi$} over all ve... ...{k-1}}\over{{\rm\bf d}_r^{T}{\rm\bf A}{\rm\bf d}_r}}.\end{equation}\end{theorem}$

The proof is straight forward. It is almost trivial from our discussion above that

$\displaystyle \alpha_r={{{\rm\bf d}_r^{T}{\rm\bf b}}\over{{\rm\bf d}_r^{T}{\rm\bf A}{\rm\bf d}_r}}.$

(185)

Now ${\rm\bf r}_{r-1}={\rm\bf b}-{\rm\bf A}{\rm\bf x}_{r-1}$ , so that ${\rm\bf b}={\rm\bf r}_{r-1}+{\rm\bf A}{\rm\bf x}_{r-1}$ and as ${\rm\bf d}_r$ is A-conjugate with all vectors in $K_{r-1}$ , including ${\rm\bf x}_{r-1}$ , we must have ${\rm\bf d}_r{^T}{\rm\bf b}={\rm\bf d}_r^{T}{\rm\bf r}_{r-1}$ .

This enables us to carry out one iteration of the Conjugate Gradient algorithm.

Suppose ${\rm\bf x}_r\in K_{r-1}=span\{{\rm\bf b},{\rm\bf A}{\rm\bf b},\dots ,{\rm\bf A}^{r-1}{\rm\bf b}\}\equiv span\{{\rm\bf d}_0,\ldots ,{\rm\bf d}_{r-1}\}$ , minimises $\phi$ over $K_{r-1}$ and the basis vectors ${\rm\bf d}_0,\ldots ,{\rm\bf d}_{r-1}$ are -conjugate. Calculate ${\rm\bf r}_{r-1}={\rm\bf b}-{\rm\bf A}{\rm\bf x}_{r-1}\in K_r$ and suppose

$\displaystyle {\rm\bf d}_r={\rm\bf r}_{r-1}+\beta_r{\rm\bf d}_{r-1},$

(186)

where $\beta_r$ is chosen to make ${\rm\bf d}_r$

-conjugate to ${\rm\bf d}_{r-1}$ ,

$\displaystyle \beta_r=-{{{\rm\bf r}_{r-1}^{T}{\rm\bf A}{\rm\bf d}_{r-1}}\over{{\rm\bf d}_{r-1}^{T}{\rm\bf A}{\rm\bf d}_{r-1}}}.$

(187)

$\displaystyle \alpha_r={{{\rm\bf d}_r^{T}{\rm\bf r}_{r-1}}\over{{\rm\bf d}_r^{T}{\rm\bf A}{\rm\bf d}_{r-1}}},$

(188)

is a search length, then

$\displaystyle {\rm\bf x}_r={\rm\bf x}_{r-1}+\alpha_r{\rm\bf d}_r,$

(189)

minimises $\phi$ over

and

$\displaystyle {\rm\bf r}_{r-1}^{T}{\rm\bf d}_s={\rm\bf r}_{r-1}^{T}{\rm\bf r}_s={\rm\bf d}_r^{T}{\rm\bf A}{\rm\bf d}_s=0,\ \ s=0,\ldots ,r-1.$

(190)

Note that we can also show the

$\displaystyle \alpha_r={{{\rm\bf d}_r^{T}{\rm\bf r}_{r-1}}\over{{\rm\bf d}_r^{T... ...bf r}_{r-1}^{T}{\rm\bf r}_{r-1}}\over{{\rm\bf d}_r^{T}{\rm\bf A}{\rm\bf d}_r}},$

(191)

and

$\displaystyle \beta_r= -{ { {\rm\bf r}_{r-1}^{T}{\rm\bf A}{\rm\bf d}_{r-1} } \o... ...{T}{\rm\bf r}_{r-1} } \over { {\rm\bf d}_{r-1}^T{\rm\bf A}{\rm\bf d}_{r-1} } }.$

(192)

It is worth observing that we only ever need to store four vectors, ${\rm\bf x}_{r-1},\ {\rm\bf d}_{r-1},\ {\rm\bf r}_{r-1},\ {\rm\bf A}{\rm\bf d}_{r-1}$ each of which can be replaced as the iteration proceeds and other than the vector, ${\rm\bf A}{\rm\bf d}_{r-1}$ there are only inner products to evaluate. The algorthm is very compact,

Algorithm

$\begin{displaymath}\begin{array}{l l} {\rm\bf x}=0&\\ {\rm\bf r}={\rm\bf b}&\\ ... ...bf r}+\beta{\rm\bf d}\\ {\rm end}\ {\rm while}&\\ \end{array}\end{displaymath}$

(193)

We do not have time to develop an error analysis but we can note that if the condition number of the matrix ${\rm\bf A}$ is $\kappa$ , then the error changes by a factor smaller or equal to

$\displaystyle {{\sqrt{\kappa}-1}\over{\sqrt{\kappa}+1}},$

(194)

each iteration. This is considerable better than for steepest descents. Even further advantage can be gained by preconditioning the matrix ${\rm\bf A}$ by pre-multiplying the equation ${\rm\bf A}{\rm\bf x}={\rm\bf b}$ by an additional matrix. We shall not consider that aspect in this course.

Next: 3 Multigrid Iteration Up: 2 Conjugate Gradient Iteration Previous: 1 Minimisation Problem & Contents

Last changed 2000-11-21