next up previous contents
Next: Brief Background on Accelerators Up: NSPCG User's Guide Previous: Choices for Accelerator

   
Brief Background on Preconditioners

The choice of a preconditioner involves the selection of a matrix Q, called the splitting matrix, such that the preconditioned system

Q-1Au = Q-1b

is better conditioned than the original system, Au = b. Clearly, one requirement for Q is that it be easily invertible (i.e., linear systems having Q as the coefficient matrix can be solved with much less effort than solving the original system Au=b). To reduce the number of iterations, it is also desirable that Q be ``close" to A in the sense that the spectral radius of I-Q-1A be small. Thus, in choosing a preconditioner, the user must select between methods which usually perform a large number of ``cheap" iterations or a small number of ``expensive" iterations. NSPCG provides a great number of preconditioning methods to aid the user in selecting a preconditioner for a production code.

Left preconditioning is illustrated above. Many nonsymmetric accelerators in the package allow other orientations for the preconditioner, such as right preconditioning,

(AQ-1)(Qu) = b

or two-sided preconditioning,

( QL-1AQR-1) (QRu) = QL-1b

in the event that the preconditioner can be split into Q = QLQR . The variable IQLR [= IPARM(22)] is set by the user to determine the orientation of the preconditioner (left, right, or two-sided) for those accelerators which allow orientations of the preconditioner other than left preconditioning. Left preconditioning is the default and is available for all accelerators. NSPCG supplies preconditioners which can be classified as ``point" or ``line" preconditioners. Iterative methods are often used in the solution of large sparse linear systems which arise from the discretization of partial differential equations using finite differences or finite elements. The solution to the partial differential equation is approximated on a discrete set of unknowns associated with a mesh defined on the domain of the equation. The terms ``point" and ``line" (or ``block") refer to whether the unknowns are updated one at a time, as with a single node in the mesh, or several at a time, as with a line of nodes in the mesh.

Let matrix A be written as

\begin{displaymath}\left(\begin{array}{cccc}
A_{11} & A_{12} & \cdots & A_{1n} ...
...ots \\
A_{n1} & A_{n2} & \cdots & A_{nn}
\end{array} \right)\end{displaymath}

If matrix A is viewed as a point matrix, then n is the order of the system and each Aij is a scalar. In that case, A can be written as a sum of point matrices

A = D - CL - CU

where D is a diagonal matrix, CL is a strictly lower triangular matrix, and CU is a strictly upper triangular matrix. If matrix A is viewed as a block matrix corresponding to a partitioning $\pi$of the unknowns, then n is the number of groups in the partition and each Ai,j is itself a matrix. In that case, A can be written as a sum of block matrices

\begin{displaymath}A = D^{(\pi)}- C_L^{(\pi)}- C_U^{(\pi)}\end{displaymath}

where $D^{(\pi)}$ is a block diagonal matrix, $C_L^{(\pi)}$ is a strictly lower triangular block matrix, and $C_U^{(\pi)}$ is a strictly upper triangular block matrix. In NSPCG, the following restrictions apply to block matrices: In the line versions of the preconditioners, a common operation is the solution of banded subsystems corresponding to the Aii diagonal block submatrices. NSPCG does not employ a cyclic reduction algorithm to solve such systems, but will attempt to vectorize the solution if such a system is actually composed of multiple independent subsystems of the same size. In this case, the algorithm used is a scalar forward and back substitution for each individual subsystem, but with each operation being done on all the subsystems in a vectorizable way.


POINT PRECONDITIONERS


The point preconditioners available in the package are the following:


Richardson method (RF):
For this method, Q = I, so, in effect, this represents the null preconditioner.
Jacobi method:
For this method, Q = D, the diagonal of A. If the matrix is known to have positive diagonal elements, the Jacobi method may be more efficiently applied by requesting diagonal scaling of the matrix and applying the RF method to the system.
Successive Overrelaxation method (SOR):
For this method,

\begin{displaymath}Q = \frac{1}{\omega}D-C_L \end{displaymath}

The SOR iteration parameter $\omega$ can be determined by an automatic adaptive procedure. This method allows no acceleration.
Symmetric Successive Overrelaxation method (SSOR):
For this method,

\begin{displaymath}Q = \frac{1}{2-\omega}
\left(\frac{1}{\omega}D - C_L\right)\...
...c{1}{\omega}D \right)^{-1}\left(\frac{1}{\omega}D - C_U \right)\end{displaymath}

SSOR iteration parameter $\omega$ can be determined by an automatic adaptive procedure if SSOR is used with either conjugate gradient or Chebyshev acceleration. Otherwise, a user-supplied fixed $\omega$ is required.
Incomplete LU Decomposition method (ILU(k)):
This preconditioner uses an incomplete LU decomposition of the matrix as a preconditioner. The parameter k denotes the level of fill-in which is to be allowed in the factorization. The form of Q is:

\begin{displaymath}Q = \left\{\begin{array}{ll}
(\Delta - C_L) \Delta^{-1} (\De...
...ta^{-1} (\Delta - T) & \mbox{if Case II}
\end{array} \right.
\end{displaymath}

Case I occurs when matrix A has Property A (i.e., is 2-cyclic) and k=0. Case II occurs if either condition fails. Here, $\Delta$ is a diagonal matrix containing the factorization pivots, S is a strictly lower triangular matrix, and T is a strictly upper triangular matrix. It can be seen that if Case I is true, then a considerable savings in storage is possible since only a vector of pivots of length N has to be stored. Q can then be implicitly represented from just $\Delta$ and the given matrix elements, which are already stored. If Case II is true, then it is necessary to store T as well (and S, if A is nonsymmetric). A fill-in level of k=0 indicates that no fill-in beyond the original matrix pattern of nonzeros is to be allowed in the factorization. For k=1, fill-in resulting from the original nonzero pattern is allowed but no fill-in resulting from this newly-created fill-in is allowed. In general, fill-in at level k results from fill-in from levels $0,1,2, \ldots, k-1.$ As k grows, the number of iterations should decrease but at the expense of increased storage and time per iteration.
Modified Incomplete LU Decomposition method (MILU(k)):
This factorization is similar to the
ILU(k) preconditioner except that the diagonal pivots of the factorization are adjusted so that Q-A has zero row sums. For many matrices, this requirement produces a better condition number for Q-1A than for the ILU(k) preconditioner. Also, this requirement forces Q-1A to have at least one eigenvalue equal to one. As in the previous preconditioner, a variable level of fill-in is allowed.
Neumann Polynomial method:
For this method, a truncated Neumann series approximation to A-1 is used. The user can specify the degree of the polynomial to be used. Assuming A is written as A=D-C, where D is the diagonal of A and C=CL+CU, then A=(I-CD-1)D so

\begin{eqnarray*}A^{-1} & = & D^{-1}(I-CD^{-1})^{-1} \\
& = & D^{-1}[I+CD^{-1}+(CD^{-1})^2 + \cdots ]
\end{eqnarray*}


A truncated form of this series is then used for Q-1. Q-1 is not explicitly computed; Q-1p is evaluated for a vector p by using a sequence of matrix-vector multiplications. This method will be effective if the spectral radius of CD-1 is less than one.
Least Squares Polynomial method:
For this method, a least squares polynomial is used to approximate the inverse of A:

\begin{displaymath}Q^{-1} = p_s(A) \approx A^{-1} \end{displaymath}

Since it is desired that the spectral radius of the iteration matrix G=I-Q-1A be small, ps is selected so that the function f(x)=1-ps(x)x has minimum 2-norm over a domain which contains the spectrum of A. Note that G=f(A) is the iteration matrix. This preconditioner is effective only if A is SPD (symmetric and positive definite) or nearly so, in which case, the domain is $[0,\Vert A \Vert _\infty]$.
Reduced System method (RS):
This method first requires that the system Au=b be permuted to a red-black system:

\begin{displaymath}\left(\begin{array}{cc}
D_R & H \\
K & D_B \end{array} \ri...
...t)=
\left(\begin{array}{c}
b_R \\
b_B \end{array} \right)
\end{displaymath}

where DR and DB are diagonal matrices. This will only be possible if matrix A has Property A [35]. The black unknowns can be eliminated from the system to yield the reduced system:

(DR - H DB-1 K) uR = bR - H DB-1 bB

which becomes the new matrix problem to be solved. Once uR has converged to an answer, uB is found by

uB = DB-1 ( bB - K uR )

The reduced system preconditioner is DR. Note that the reduced system is not explicitly computed with this method. However, NSPCG contains a facility for explicitly computing the reduced system and applying any of the package preconditioners to it. See Section 11 for more details.
LINE PRECONDITIONERS


The line preconditioners available in the package are the following:


Line Jacobi method:
For this method, $Q=D^{(\pi)}$, the block diagonal part of A. For many matrices resulting from finite difference discretizations of partial differential equations, $D^{(\pi)}$ is a tridiagonal matrix. The solution to the preconditioning equation $D^{(\pi)}z = r$ is accomplished by a recursive forward and back solve. If, however, $D^{(\pi)}$ consists of multiple independent subsystems of size KBLSZ, NSPCG will perform each step of the factorization and solution process across all the subsystems in an independent manner. This method will vectorize on computers allowing constant stride vector operations.
Line Jacobi (approximate inverse) method:
This method uses a banded approximation to the inverse of $D^{(\pi)}$. The solution of $D^{(\pi)}z = r$ is accomplished by

\begin{displaymath}z = \left[\left(D^{(\pi)}\right)^{-1}\right]r
\end{displaymath}

where the $[\cdot ]$ operator indicates a truncation of the matrix to a banded system. The variable LTRUNC [= IPARM(17)] is used to determine the half-bandwidth to be used in the truncation. If $D^{(\pi)}$ is diagonally dominant, then the diagonals of $\left(D^{(\pi)}\right)^{-1}$ decay at an exponential rate the further the diagonal is away from the main diagonal. Hence for diagonally dominant $D^{(\pi)}$, a banded approximation to the inverse of $D^{(\pi)}$ will suffice. Thus, the preconditioning step has been changed from a solve to a matrix-vector multiply, a vectorizable operation.
Line SOR (LSOR):
For this method,

\begin{displaymath}Q=\frac{1}{\omega}D^{(\pi)}-C_L^{(\pi)}\end{displaymath}

Line SSOR (LSSOR):
For this method,

\begin{displaymath}Q = \frac{1}{2-\omega}
\left(\frac{1}{\omega}D^{(\pi)}- C_L^...
...ight)^{-1}
\left(\frac{1}{\omega}D^{(\pi)}- C_U^{(\pi)}\right)\end{displaymath}

Incomplete Block LU Decomposition, Version 1:
For this method, Q takes the form:

\begin{displaymath}Q = \left\{\begin{array}{ll}
\left(M-C_L^{(\pi)}\right)M^{-1...
...{-1} \left(M-T \right)& \mbox{if Case II}
\end{array} \right. \end{displaymath}

Case I occurs when matrix A has block Property A and no fill-in of blocks is allowed. Case II occurs otherwise. M here is a block diagonal matrix. Truncated inverses of diagonal pivot block matrices are used in the construction of the factorization. We illustrate the factorization process in the case that A is block tridiagonal:

\begin{displaymath}A = \left(\begin{array}{ccccc}
D_1 & U_1 & & & \\
L_1 & D_...
..._{n-1} & U_{n-1} \\
& & & L_{n-1} & D_n \end{array} \right)
\end{displaymath}

Then A has block Property A, so Case I is used. Thus we seek a factorization of the form
$\left(I-C_L^{(\pi)}M^{-1} \right)\left(M-C_U^{(\pi)}\right)$

next up previous contents
Next: Brief Background on Accelerators Up: NSPCG User's Guide Previous: Choices for Accelerator