next up previous contents
Next: 2 Stability Up: 1 Model Problem Previous: 2 Finite Volume Methods   Contents

Subsections

3 Finite Element Method

The finite element method has now quite a long and interesting history. It was originally developed by engineers dealing with structural analysis. They realised that if say a loaded plate was divided into small triangles or quadrilaterals (`elements') and the displacement within an element was assumed to be linear in space variables, then the stress could be calculated throughout the element and the sum of all the stresses would have to agree with the loading. Hence they obtained a linear combination of displacements equal to the loading, and inverting this system gave the displacements and then the stresses in the structure. It was quickly realised that this very practical method could be put in a sound mathematical framework, and once that framework was developed, the method could be applied to problems other than structural analysis. We shall look only at the modern understanding of finite elements. It is important to distinguish our understanding of how finite element analysis approximates the solution of a differential system from the details of how the method is implemented.

We seek a solution of $ {\mathcal L}u=f$ on $ x\in\Omega$ and its associated boundary conditions. In finite differences we tried to find a set of values {$ U_j$} which approximated $ u$ at a set of points in the domain. Now ask the question. Suppose we have a set of simple functions, can we approximate $ u$ using these functions? In abstract terms, if $ u\in {\mathcal A}$ and we have a set of (possibly) simple functions $ {\mathcal B}\subset{\mathcal A}$ can we find a function $ U\in{\mathcal B}$ which is a good approximation to $ u$? Obviously this immediately points us towards needing some way of measuring how good one function represents another: that is we need some way of measuring distance in the set of functions $ {\mathcal A}$. One way to achieve this is to use an inner product, for instance if $ \omega >0$ we can define

$\displaystyle (u,v)=\int_{\Omega}\omega (x)u(x)v(x){\rm d}x,$ (55)

to be an inner product and then we can measure the size of a function by

$\displaystyle \vert\vert u\vert\vert_2=\sqrt{(u,u)},\ u\in{\mathcal A}$ (56)

and `distance' between functions by

$\displaystyle d(u,v)=\vert\vert u-v\vert\vert_2,\ \ u,v\in {\mathcal A}.$ (57)

The definition of `distance' is not unique, for the inner product example, the weight function $ \omega$ will affect the calculation of `distance'.

The approximation problem can now be posed very precisely: Given $ u\in {\mathcal A}$ and a subset $ {\mathcal B}\subset{\mathcal A}$ can we find $ U\in{\mathcal B}$ such that

$\displaystyle \vert\vert u-U\vert\vert_2\le \vert\vert u-V\vert\vert_2,\ \ \forall V\in{\mathcal B}?$ (58)

In the case of a norm based on an inner product we can go someway to answering this question. Since

$\displaystyle \vert\vert u-V\vert\vert_2^2=\vert\vert u-U+U-V\vert\vert_2^2=(u-U+U-V,u-U+U-V),$ (59)

we have

$\displaystyle \vert\vert u-V\vert\vert_2^2=\vert\vert u-U\vert\vert_2^2 +\vert\vert U-V\vert\vert_2^2+2(u-U,U-V).$ (60)

Since the last term could be positive or negative, whereas the second last term is alsways positive, in order for

$\displaystyle \vert\vert u-U\vert\vert_2\le \vert\vert u-V\vert\vert_2\ \ \forall V\in{\mathcal B},$ (61)

we need only

$\displaystyle (u-U,W)=0\ \ \forall W\in{\mathcal B}.$ (62)

This is now called Galerkin orthogonality. It is just a generalisation of our physical intuition that in coordinate geometry, if you wanted the point on a straight line closest to a curve, you would make the line from the approximating point on the straight line to the curve perpendicular to the straight line.

Before we finally deal with the finite element method we need to re-look at the differential equation. We have already seen that many systems come from conservation laws which deal with integrals over volumes of quantities. We have also seen that in the finite volume method, we reversed this by looking at an integral of the equation over a small `volume'. Now we generalise this further by saying that instead of requiring $ {\mathcal L}u-f=0$, for some set of functions $ {\mathcal A}_0=\{w\in{\mathcal A}\vert
w(0)=w(1)=0\}$ we require the inner product of the equation with all functions in this space to be zero:

$\displaystyle ({\mathcal L}u-f,w)=0,\ \ \forall w\in {\mathcal A}_0.$ (63)

This is usually called a weak form of the equation. One reason for doing this is that we can try to use integration by parts to reduce the highest derivative of $ u$ which occurs. Another reason is that many conservation laws have solutions which give an extremum of some functional and these very natuarlly fall into this framework.

In the case of our model problem, a solution of the differential equation needs to be twice differentiable. If we can integrate by parts we need only consider functions which have integrable derivatives (so the derivative does not need to be continuous). There are special symbols for this space of functions and some of its subspaces.

$\displaystyle H^1[0,1]=\{{\rm functions\ }u{\rm\ on\ }[0,1]{\rm\ with\ } \int_0^1({u'}^2+u^2){\rm d}x\ {\rm bounded},$ (64)

We will usually just write $ H^1$ and leave the domain ambiguous.

$\displaystyle H^1_E=\{u\in H^1\vert\ u(0)=a,\ u(1)=b\},$ (65)

$\displaystyle H^1_0=\{u\in H^1\vert\ u(0)=0,\ u(1)=0\}.$ (66)

The weak form of our equation is

$\displaystyle (-u''+\kappa^2u-f,w)=0,\ \forall w\in H^1_0,$ (67)

and if we integrate by parts

$\displaystyle (u',w')+\kappa^2(u,w)-(f,w)=0.$ (68)

The variational form of the equation is: find $ u\in H^1_E$ such that

$\displaystyle J[u]=\int_0^1[{1\over 2}({u'}^2+\kappa^2u^2)-fu]{\rm d}x,$ (69)

is an extremum. We shall not take this further here.

If we define a new inner product:

$\displaystyle a(u,v)=(u',v')+\kappa^2(u,v),$ (70)

this can be used to define a new norm, sometimes called an energy norm,

$\displaystyle \vert\vert u\vert\vert_a=\sqrt{a(u,u)},$ (71)

and the weak form of the equation is

$\displaystyle a(u,w)=(f,w)\ \ \forall w\in H^1_0.$ (72)

This has now established the framework within which we can apply finite elements. Instead of looking for the solution of a differential equation, we are seeking a function $ u\in H^1_E$ (as it is in $ H^1_E$ it automatically satisfies the boundary conditions) which satisfies the weak form, an integral equation, for all functions in $ H^1_0$.

We now wish to approximate the function $ u$ throughout the whole interval, $ [0,1]$, on which it is defined. It is logical to look at functions which are already in $ H^1_E$, for instance we might choose functions which are piecewise linear. In the background, but not needed to be specified at this point, is that there is a mesh (or set of elements) which cover the interval and they have some characteristic size, say $ h$. For instance the simplest set of elements would come from dividing the interval by the set of points $ x_j=jh,\ j=0,1,\ldots ,m$ where $ h=1/m$ so that the elements are the intervals $ [jh,(j+1)h],\ j=0,\ldots ,m-1$. Suppose we have chosen a set of approximating functions in $ H^1$ and we let

$\displaystyle S^h_E=\{U\in H^1 \ \vert\ U(0)=a,\ U(1)=b,\ {\rm U\ a\ linear\ combination\ of \ approximating\ functions}\},$ (73)

$\displaystyle S^h_0=\{W\in H^1 \ \vert\ W(0)=0,\ W(1)=0,\ {\rm W\ a\ linear\ combination\ of \ approximating\ functions}\}.$ (74)

We see that $ S^h_E\subset H^1_E$, and $ S^h_0\subset H^1_0$.

Now $ U\in S^h_E$ will be a best approximation to $ u\in H^1_E$ measured in the $ a$-norm if the error is orthogonal (orthogonality measured using the $ a$-inner product) to all functions in $ S^h_0$,

$\displaystyle a(u-U,W)=0,\ \forall W\in S^h_0.$ (75)

Now $ a(.,.)$ is a linear operator and we know that the weak form which $ u$ satisfies gives that as $ S^h_0\subset H^1_0$ $ a(u,W)$ then $ a(u,W)=(f,W)$, so that the orthogonality condition is

$\displaystyle a(U,W)=(f,W)\ \forall W\in S^h_0.$ (76)

This is the essence of the finite element method: if we can find $ U$ then it is a best approximation to $ u$ measured in a particular norm (in this case the $ a$-norm).

To make this useful we have to carry on a bit to understand how we can calculate $ U$. We suppose that we use a set of linearly independent basis functions $ \phi_0,\ldots ,\phi_m$ (linear independence measured using the $ a(.,.)$ inner product) such that

$\displaystyle S^h_E=\{U\in H^1_E\ \vert\ U=\sum_{j=0}^{m}U_j\ {\rm with}\ U(0)=a,\ U(1)=b)\},$ (77)

$\displaystyle S^h_0=\{W\in H^1_0\ \vert\ W=\sum_{j=0}^{m}W_j\phi_j\ {\rm with}\ U(0)=U(1)=0\},$ (78)

then since $ S^h_0$ is a linear subspace the condition which determines $ U$ can also be written

$\displaystyle a(U,\phi_r)=(f,\phi_r)\ r=1,\ldots ,m-1,$ (79)

and if we substitute the form of $ U$ in terms of the basis funcitons,

$\displaystyle U=\sum_{j=0}^mU_j\phi_j,$ (80)

then

$\displaystyle \sum_{j=0}^mU_ja(\phi_j,\phi_r)=(f,\phi_r),\ r=1,\ldots ,m-1.$ (81)

This is just a matrix equation for the vector $ {\rm\bf U}^{\rm T}=(U_0,U_1,\ldots ,U_m)$ (it is incomplete, it needs the addition of the two boundary conditions to produce a full matrix). This is still a very general framework so we now focus on one example implementation of this finite element framework.

Suppose the functions $ \phi_r$ are piecewise linear functions,

$\displaystyle \phi_r(x)=\left(\begin{array}{l r} 0 &x\le (r-1)h ,\\  {{x-(r-1)h...
...rh,\\  {{(r+1)h-x}\over h}&rh< x\le (r+1)h ,\\  0 &x>(r+1)h. \end{array}\right.$ (82)

We next need to evaluate the numbers $ a(\phi_r,\phi_s)=(\phi_r'\phi_s')+\kappa^2(\phi_r,\phi_s).$

(a) $ A_{r,s}=(\phi_r',\phi_s')$

If $ s<r-1$ or $ s>r+1$ then $ A_{r,s}=0$, if $ s=r-1$ or $ s=r+1$, $ A_{r,s}=-1/h$, if $ s=r$, $ A_{s,r}=2/h$.

(b) $ B_{r,s}=(\phi_r,\phi_s)$ If $ s<r-1$ or $ s>r+1$ then $ B_{r,s}=0$, if $ s=r-1$ or $ s=r+1$, $ B_{r,s}=h/6$, if $ s=r$, $ A_{s,r}=2h/3$.

if we define

$\displaystyle \tilde f_j={1\over h}\int_{(j-1)h}^{(j+1)h}\phi_j(x)f(x){\rm d}x,$ (83)

then we must have

$\displaystyle -{1\over h}U_{r+1}+{2\over h}U_r-{1\over h}U_{r-1}+ {h\over 6}U_{r+1}+{{2h}\over 3}U_r+{h\over 6}U_{r-1}=h\tilde f_r,\ r=1,\ldots ,m-1.$ (84)

if we define a `mass' matrix $ {\rm\bf M}$ by

$\displaystyle {\rm\bf M}={1\over 6}\left(\begin{array}{r r r r c r} 0&0&0&0&\cd...
...0&0&1&4&\cdots&0\\  .&.&.&.&\cdots&.\\  0&0&0&0&\cdots&0\\  \end{array}\right),$ (85)

and use the same matrix $ {\rm\bf K}$ as occured in finite difference and finite volume formulations, then in our model problem the vector $ {\rm\bf U}$ is determined by

$\displaystyle ({1\over h^2}{\rm\bf K}+\kappa^2{\rm\bf M}){\rm\bf U}=\tilde{\rm\bf f}.$ (86)

1 Note 1 Error Analysis

We are not able to go into error analysis of finite element methods in great detail. There are very readable sections in Eriksson et al which you are advised to read through. One key idea is as follows: since we have a best approximation (measured in the energy norm) the error in that norm must be bounded by the error which occurs for any other approximating function, so we choose an interpolating approximation since Lagrange's theorem tells us the error in interpolating functions. Mathematically, since

$\displaystyle \vert\vert u-U\vert\vert_a\le \vert\vert u-V\vert\vert_a,\ \forall V\in S^h_E,$ (87)

using the same piecewise linear functions $ \phi_r$, choose

$\displaystyle V=\sum_{r=0}^mu_r\phi_r(x),$ (88)

and Lagrange's theorem gives

$\displaystyle {\rm on\ }(x_i<x\le x_{i+1})\ :\ u-V={1\over 2}(x-x_i)(x_{i+1}-x)u''(\xi_i),$ (89)

where $ x_i<\xi_i<x_{i+1}$ so that

$\displaystyle \vert u-V\vert\le {1\over 8}(x_{i+1}-x_i)^2\vert\vert u''\vert\vert_\infty,$ (90)

and similarly

$\displaystyle \vert u'-V'\vert\le (x_{i+1}-x_i)\vert\vert u''\vert\vert_\infty,$ (91)

so we have a a priori error estimate

$\displaystyle \vert\vert u-U\vert\vert_a^2\le [h^2+{1\over{64}}\kappa^2h^4]\vert\vert u''\vert\vert_\infty.$ (92)

This may appear a little weak, since it indicates an error in the energy norm which is only first order in $ h$, but with more analysis better error estimates can be found.

2 Note 2 A posteriori error analysis and mesh adaptation

Once we have an approximate solution $ U(x)$, one sensible question is `how well does $ U$ satisfy the original differential equation?' Of course this needs to be carefully thought out because there will be a set of points (the boundaries of elements) where $ U$ cannot be differentiated sufficiently to be substituted into the equation. As this is ony a finite set of points, we can still calculate integrals across the domain.

We start by defining a residual function

$\displaystyle R(U)=f-{\mathcal L}U,$ (93)

on each interval $ (x_i,x_{i+1})$ and a mesh function $ h(x)$ by

$\displaystyle h(x)=x_{i+1}-x_i, \ \ x_i\le x < x_{i+1},$ (94)

so with a uniform mesh, $ h$ is just a constant but with a variable mesh, gives the local mesh size.

Intuitively, a measure of how well the approximation represents the solution is given by an integral of the residual across the domain, so if the residual is relatively large we should think of changing the mesh to reduce the residual, if the residual is relatively small, the mesh could be expanded. We cannot go into the mathematics too deeply, but it is possible to derive results for our model problem of the form

$\displaystyle \vert\vert u'-U'\vert\vert_2\le C\vert\vert h(x)R(U)\vert\vert_2,$ (95)

for some positive constant $ C$. This means we can produce an algorithm to adapt the mesh to a particular problem and to achieve a given tolerance for some measure of the error. In this case we can seek to make $ M=\vert\vert hR(U)\vert\vert_2^2$ less than some tolerance by firstly moving mesh points, and then increasing the number of mesh points.

As an example, consider the case $ k=0$ and approximation using piecewise linear functions, when the residual is just $ R(U)=f$, and we take $ f=1,\ 0\le x <{2\over 3}$, $ f=100,{2\over 3}\le x \le 1$. If we take four points uniformly across the interval, then $ M=1111$. If however we move the mesh so the four points are at $ x=0,{2\over 3},{5\over 6}, 1$, then $ M=93$ and if we add one more point, so the points are at $ x=0,{2\over 3},{7\over 9}, {8\over 9},1$ then $ M=14$. If five points were to be uniformly distributed across the interval $ [0,1]$ them $ M=208$. Since such large gains can be made using an algorithmic approach, mesh adaptation is becoming more important in the numerical solution of PDE's.


next up previous contents
Next: 2 Stability Up: 1 Model Problem Previous: 2 Finite Volume Methods   Contents
Last changed 2000-11-21