We present the runtime comparison of the two versions of Super LU{_}DIST, using up to 128 processors of the IBM SP at NERSC. One version provides the global input interface, and another provides the distributed input interface. The comparison includes the total runtime of the solver with both 32-bit and 64-bit addressing modes, the time breakdown for different phases of the solver. We also present an in-depth comparison off our sparse matrix-vector multiplication methods in the context of iterative refinement. Finally, we describe our Fortran 90 interface that enhances the usability of the software
The aim of this project was to encapsulate the needs of computational science applications. Performa...
The paper proposes an analytical model for estimating the performance of Pipelined Ring algorithm fo...
Intel Xeon Phi is a coprocessor with sixty-one cores in a single chip. The chip has a more powerful ...
We present the runtime comparison of the two versions of Super LU_DIST, using up to 128 processors ...
SuperLU_DIST is a distributed memory parallel solver for sparse linear systems. The solver makes sev...
We give an overview of the algorithms, design philosophy, and implementation techniques in the soft...
In this paper, we present the main algorithmic features in the software package SuperLU DIST, a dis...
This paper provides a comprehensive study and comparison of two state-of-the-art direct solvers for ...
We investigate performance characteristics for the LU factorization of large matrices with various s...
We investigate performance characteristics for the LU factorization of large matrices with various ...
It is important to have a fast, robust and scalable algorithm to solve a sparse linear system AX=B. ...
In this paper, we present the main algorithmic features in the software package SuperLU{_}DIST, a di...
Sparse parallel factorization is among the most complicated and irregular algorithms to analyze and ...
This document describes a collection of three related ANSI C subroutine libraries for solving sparse...
Sparse parallel factorization is among the most complicated and irregular algorithms to analyze and ...
The aim of this project was to encapsulate the needs of computational science applications. Performa...
The paper proposes an analytical model for estimating the performance of Pipelined Ring algorithm fo...
Intel Xeon Phi is a coprocessor with sixty-one cores in a single chip. The chip has a more powerful ...
We present the runtime comparison of the two versions of Super LU_DIST, using up to 128 processors ...
SuperLU_DIST is a distributed memory parallel solver for sparse linear systems. The solver makes sev...
We give an overview of the algorithms, design philosophy, and implementation techniques in the soft...
In this paper, we present the main algorithmic features in the software package SuperLU DIST, a dis...
This paper provides a comprehensive study and comparison of two state-of-the-art direct solvers for ...
We investigate performance characteristics for the LU factorization of large matrices with various s...
We investigate performance characteristics for the LU factorization of large matrices with various ...
It is important to have a fast, robust and scalable algorithm to solve a sparse linear system AX=B. ...
In this paper, we present the main algorithmic features in the software package SuperLU{_}DIST, a di...
Sparse parallel factorization is among the most complicated and irregular algorithms to analyze and ...
This document describes a collection of three related ANSI C subroutine libraries for solving sparse...
Sparse parallel factorization is among the most complicated and irregular algorithms to analyze and ...
The aim of this project was to encapsulate the needs of computational science applications. Performa...
The paper proposes an analytical model for estimating the performance of Pipelined Ring algorithm fo...
Intel Xeon Phi is a coprocessor with sixty-one cores in a single chip. The chip has a more powerful ...