Next: Brief Background on Accelerators Up: NSPCG User's Guide Previous: Choices for Accelerator

Brief Background on Preconditioners

The choice of a preconditioner involves the selection of a matrix Q, called the splitting matrix, such that the preconditioned system

Q^-1Au = Q^-1b

is better conditioned than the original system, Au = b. Clearly, one requirement for Q is that it be easily invertible (i.e., linear systems having Q as the coefficient matrix can be solved with much less effort than solving the original system Au=b). To reduce the number of iterations, it is also desirable that Q be ``close" to A in the sense that the spectral radius of I-Q^-1A be small. Thus, in choosing a preconditioner, the user must select between methods which usually perform a large number of ``cheap" iterations or a small number of ``expensive" iterations. NSPCG provides a great number of preconditioning methods to aid the user in selecting a preconditioner for a production code.

Left preconditioning is illustrated above. Many nonsymmetric accelerators in the package allow other orientations for the preconditioner, such as right preconditioning,

(AQ^-1)(Qu) = b

or two-sided preconditioning,

( Q_L^-1AQ_R^-1) (Q_Ru) = Q_L^-1b

in the event that the preconditioner can be split into Q = Q_LQ_R . The variable IQLR [= IPARM(22)] is set by the user to determine the orientation of the preconditioner (left, right, or two-sided) for those accelerators which allow orientations of the preconditioner other than left preconditioning. Left preconditioning is the default and is available for all accelerators. NSPCG supplies preconditioners which can be classified as ``point" or ``line" preconditioners. Iterative methods are often used in the solution of large sparse linear systems which arise from the discretization of partial differential equations using finite differences or finite elements. The solution to the partial differential equation is approximated on a discrete set of unknowns associated with a mesh defined on the domain of the equation. The terms ``point" and ``line" (or ``block") refer to whether the unknowns are updated one at a time, as with a single node in the mesh, or several at a time, as with a line of nodes in the mesh.

Let matrix A be written as

$\begin{displaymath}\left(\begin{array}{cccc} A_{11} & A_{12} & \cdots & A_{1n} ... ...ots \\ A_{n1} & A_{n2} & \cdots & A_{nn} \end{array} \right)\end{displaymath}$

If matrix A is viewed as a point matrix, then n is the order of the system and each A_ij is a scalar. In that case, A can be written as a sum of point matrices

A = D - C_L - C_U

where D is a diagonal matrix, C_L is a strictly lower triangular matrix, and C_U is a strictly upper triangular matrix. If matrix A is viewed as a block matrix corresponding to a partitioning $\pi$ of the unknowns, then n is the number of groups in the partition and each A_i,j is itself a matrix. In that case, A can be written as a sum of block matrices

$\begin{displaymath}A = D^{(\pi)}- C_L^{(\pi)}- C_U^{(\pi)}\end{displaymath}$

where $D^{(\pi)}$ is a block diagonal matrix, $C_L^{(\pi)}$ is a strictly lower triangular block matrix, and $C_U^{(\pi)}$ is a strictly upper triangular block matrix. In NSPCG, the following restrictions apply to block matrices:

If the matrix is not permuted by NSPCG, then block preconditioners can only be used with symmetric or nonsymmetric diagonal storage format of the matrix. In that case, each A_ii is required to be a dense banded matrix, and all submatrices must be of equal size. Also, the user must specify the variable KBLSZ [= IPARM(19)]. KBLSZ is the order of the 1-D block diagonal submatrices A_ii which must be constant for all i.
If the matrix is permuted by NSPCG, then block preconditioners can be used with any storage format for the matrix. In that case, each A_ii is required to be a dense banded matrix, and the submatrices can be of unequal size. In the case of primary storage format, each A_ii is required to be a diagonal matrix.

In the line versions of the preconditioners, a common operation is the solution of banded subsystems corresponding to the A_ii diagonal block submatrices. NSPCG does not employ a cyclic reduction algorithm to solve such systems, but will attempt to vectorize the solution if such a system is actually composed of multiple independent subsystems of the same size. In this case, the algorithm used is a scalar forward and back substitution for each individual subsystem, but with each operation being done on all the subsystems in a vectorizable way.

POINT PRECONDITIONERS

The point preconditioners available in the package are the following:

Richardson method (RF):: For this method, Q = I, so, in effect, this represents the null preconditioner.
Jacobi method:: For this method, Q = D, the diagonal of A. If the matrix is known to have positive diagonal elements, the Jacobi method may be more efficiently applied by requesting diagonal scaling of the matrix and applying the RF method to the system.
Successive Overrelaxation method (SOR):: For this method,

$\begin{displaymath}Q = \frac{1}{\omega}D-C_L \end{displaymath}$

The SOR iteration parameter $\omega$ can be determined by an automatic adaptive procedure. This method allows no acceleration.
Symmetric Successive Overrelaxation method (SSOR):: For this method,

$\begin{displaymath}Q = \frac{1}{2-\omega} \left(\frac{1}{\omega}D - C_L\right)\... ...c{1}{\omega}D \right)^{-1}\left(\frac{1}{\omega}D - C_U \right)\end{displaymath}$

SSOR iteration parameter $\omega$ can be determined by an automatic adaptive procedure if SSOR is used with either conjugate gradient or Chebyshev acceleration. Otherwise, a user-supplied fixed $\omega$ is required.
Incomplete LU Decomposition method (ILU(k)):: This preconditioner uses an incomplete LU decomposition of the matrix as a preconditioner. The parameter k denotes the level of fill-in which is to be allowed in the factorization. The form of Q is:

$\begin{displaymath}Q = \left\{\begin{array}{ll} (\Delta - C_L) \Delta^{-1} (\De... ...ta^{-1} (\Delta - T) & \mbox{if Case II} \end{array} \right. \end{displaymath}$

Case I occurs when matrix A has Property A (i.e., is 2-cyclic) and k=0. Case II occurs if either condition fails. Here, $\Delta$ is a diagonal matrix containing the factorization pivots, S is a strictly lower triangular matrix, and T is a strictly upper triangular matrix. It can be seen that if Case I is true, then a considerable savings in storage is possible since only a vector of pivots of length N has to be stored. Q can then be implicitly represented from just $\Delta$ and the given matrix elements, which are already stored. If Case II is true, then it is necessary to store T as well (and S, if A is nonsymmetric). A fill-in level of k=0 indicates that no fill-in beyond the original matrix pattern of nonzeros is to be allowed in the factorization. For k=1, fill-in resulting from the original nonzero pattern is allowed but no fill-in resulting from this newly-created fill-in is allowed. In general, fill-in at level k results from fill-in from levels $0,1,2, \ldots, k-1.$ As k grows, the number of iterations should decrease but at the expense of increased storage and time per iteration.
Modified Incomplete LU Decomposition method (MILU(k)):: This factorization is similar to the
ILU(k) preconditioner except that the diagonal pivots of the factorization are adjusted so that Q-A has zero row sums. For many matrices, this requirement produces a better condition number for Q^-1A than for the ILU(k) preconditioner. Also, this requirement forces Q^-1A to have at least one eigenvalue equal to one. As in the previous preconditioner, a variable level of fill-in is allowed.
Neumann Polynomial method:: For this method, a truncated Neumann series approximation to A^-1 is used. The user can specify the degree of the polynomial to be used. Assuming A is written as A=D-C, where D is the diagonal of A and C=C_L+C_U, then A=(I-CD^-1)D so

$\begin{eqnarray*}A^{-1} & = & D^{-1}(I-CD^{-1})^{-1} \\ & = & D^{-1}[I+CD^{-1}+(CD^{-1})^2 + \cdots ] \end{eqnarray*}$

A truncated form of this series is then used for Q^-1. Q^-1 is not explicitly computed; Q^-1p is evaluated for a vector p by using a sequence of matrix-vector multiplications. This method will be effective if the spectral radius of CD^-1 is less than one.
Least Squares Polynomial method:: For this method, a least squares polynomial is used to approximate the inverse of A:

$\begin{displaymath}Q^{-1} = p_s(A) \approx A^{-1} \end{displaymath}$

Since it is desired that the spectral radius of the iteration matrix G=I-Q^-1A be small, p_s is selected so that the function f(x)=1-p_s(x)x has minimum 2-norm over a domain which contains the spectrum of A. Note that G=f(A) is the iteration matrix. This preconditioner is effective only if A is SPD (symmetric and positive definite) or nearly so, in which case, the domain is $[0,\Vert A \Vert _\infty]$ .
Reduced System method (RS):: This method first requires that the system Au=b be permuted to a red-black system:

$\begin{displaymath}\left(\begin{array}{cc} D_R & H \\ K & D_B \end{array} \ri... ...t)= \left(\begin{array}{c} b_R \\ b_B \end{array} \right) \end{displaymath}$

where D_R and D_B are diagonal matrices. This will only be possible if matrix A has Property A [35]. The black unknowns can be eliminated from the system to yield the reduced system:

(D_R - H D_B^-1 K) u_R = b_R - H D_B^-1 b_B

which becomes the new matrix problem to be solved. Once u_R has converged to an answer, u_B is found by

u_B = D_B^-1 ( b_B - K u_R )

The reduced system preconditioner is D_R. Note that the reduced system is not explicitly computed with this method. However, NSPCG contains a facility for explicitly computing the reduced system and applying any of the package preconditioners to it. See Section 11 for more details.

LINE PRECONDITIONERS

The line preconditioners available in the package are the following:

Line Jacobi method:
For this method, $Q=D^{(\pi)}$ , the block diagonal part of A. For many matrices resulting from finite difference discretizations of partial differential equations, $D^{(\pi)}$ is a tridiagonal matrix. The solution to the preconditioning equation $D^{(\pi)}z = r$ is accomplished by a recursive forward and back solve. If, however, $D^{(\pi)}$ consists of multiple independent subsystems of size KBLSZ, NSPCG will perform each step of the factorization and solution process across all the subsystems in an independent manner. This method will vectorize on computers allowing constant stride vector operations.
Line Jacobi (approximate inverse) method:
This method uses a banded approximation to the inverse of $D^{(\pi)}$ . The solution of $D^{(\pi)}z = r$ is accomplished by

$\begin{displaymath}z = \left[\left(D^{(\pi)}\right)^{-1}\right]r \end{displaymath}$

where the $[\cdot ]$ operator indicates a truncation of the matrix to a banded system. The variable LTRUNC [= IPARM(17)] is used to determine the half-bandwidth to be used in the truncation. If $D^{(\pi)}$ is diagonally dominant, then the diagonals of $\left(D^{(\pi)}\right)^{-1}$ decay at an exponential rate the further the diagonal is away from the main diagonal. Hence for diagonally dominant $D^{(\pi)}$ , a banded approximation to the inverse of $D^{(\pi)}$ will suffice. Thus, the preconditioning step has been changed from a solve to a matrix-vector multiply, a vectorizable operation.
Line SOR (LSOR):
For this method,

$\begin{displaymath}Q=\frac{1}{\omega}D^{(\pi)}-C_L^{(\pi)}\end{displaymath}$

Line SSOR (LSSOR):
For this method,

$\begin{displaymath}Q = \frac{1}{2-\omega} \left(\frac{1}{\omega}D^{(\pi)}- C_L^... ...ight)^{-1} \left(\frac{1}{\omega}D^{(\pi)}- C_U^{(\pi)}\right)\end{displaymath}$

Incomplete Block LU Decomposition, Version 1:
For this method, Q takes the form:

$\begin{displaymath}Q = \left\{\begin{array}{ll} \left(M-C_L^{(\pi)}\right)M^{-1... ...{-1} \left(M-T \right)& \mbox{if Case II} \end{array} \right. \end{displaymath}$

Case I occurs when matrix A has block Property A and no fill-in of blocks is allowed. Case II occurs otherwise. M here is a block diagonal matrix. Truncated inverses of diagonal pivot block matrices are used in the construction of the factorization. We illustrate the factorization process in the case that A is block tridiagonal:

$\begin{displaymath}A = \left(\begin{array}{ccccc} D_1 & U_1 & & & \\ L_1 & D_... ..._{n-1} & U_{n-1} \\ & & & L_{n-1} & D_n \end{array} \right) \end{displaymath}$

Then A has block Property A, so Case I is used. Thus we seek a factorization of the form
$\left(I-C_L^{(\pi)}M^{-1} \right)\left(M-C_U^{(\pi)}\right)$

Next: Brief Background on Accelerators Up: NSPCG User's Guide Previous: Choices for Accelerator