Fp64 use cases

6/24/2023

This is considered a best case scenario, since the asymptotic speedup approaches 2 \(\times \), meaning a minimal overhead by the IR stage. Typically, an MP-IR solver (FP32 \(\rightarrow \)FP64) requires 2–3 iterations for a well-conditioned problem. This performance advantage can be completely gone if too many iterations are required for convergence. As mentioned before, a maximum of 2 \(\times \) speedup is expected from the factorization stage in FP32. Convergence is achieved when the residual is small enough.Ī key factor for the high performance of MP-IR solvers is the number of iterations in the refinement stage. In this paper, we discuss the specific case where the matrix \(A_ c\).

The standard way of solving such systems includes two steps: a matrix factorization step and a triangular solve step. The solution of a dense linear system of equations ( \(Ax = b\)) is a critical component in many scientific applications. However, matrix properties such as the condition number and the eigenvalue distribution can affect the convergence rate, which would consequently affect the overall performance. Our experiments on the V100 GPU show that performance speedups are up to 4.7 \(\times \) against a direct double-precision solver. A preprocessing step is also developed, which scales and shifts the matrix, if necessary, in order to preserve its positive-definiteness in lower precisions. Two different types of IR solvers are discussed on a wide range of test matrices.

Since the Cholesky factors are affected by the low precision, an iterative refinement (IR) solver is required to recover the solution back to double-precision accuracy. The solver is based on a mixed-precision Cholesky factorization that utilizes the high-performance tensor core units in CUDA-enabled GPUs. In this paper, we present a high-performance, mixed-precision linear solver ( \(Ax=b\)) for symmetric positive definite systems in double-precision using graphics processing units (GPUs). While half-precision has been driven largely by machine learning applications, recent algorithmic advances in numerical linear algebra have discovered beneficial use cases for half precision in accelerating the solution of linear systems of equations at higher precisions. Half-precision computation refers to performing floating-point operations in a 16-bit format.

0 Comments

Fp64 use cases

Leave a Reply.

Author

Archives

Categories