SuperLU_DIST is a distributed memory parallel solver for sparse linear systems. The solver makes several calls to BLAS library routines in its numerical factorization phase. The performance of the BLAS library can significantly affect the overall performance of the solver as the required BLAS operations are typically computationally dense. In this regard, we examine how the overall performance of the SuperLU_DIST solver can be improved by employing optimized BLAS libraries. In particular, we try using Intel Math Kernel Library (MKL) and Parallel Linear Algebra Subroutines for Multicore Architecture (PLASMA) libraries. Using MKL can provide an approximate performance improvement of 50 %, and using PLASMA can improve the performance by around...
This dissertation details contributions made by the author to the field of computer science while wo...
The aim of this paper is to evaluate the performance of existing parallel linear equation solvers to...
The ScaLAPACK library for parallel dense matrix computations is built on top of the BLACS communicat...
In this paper, we present the main algorithmic features in the software package SuperLU DIST, a dis...
International audienceSparse direct solvers is a time consuming operation required by many scientifi...
This paper provides a comprehensive study and comparison of two state-of-the-art direct solvers for ...
It is important to have a fast, robust and scalable algorithm to solve a sparse linear system AX=B. ...
International audienceSparse direct solvers is a time consuming operation required by many scientifi...
We present the runtime comparison of the two versions of Super LU{_}DIST, using up to 128 processors...
We give an overview of the algorithms, design philosophy, and implementation techniques in the soft...
We present the runtime comparison of the two versions of Super LU_DIST, using up to 128 processors ...
One of the key areas for enabling users to efficiently use an HPC system is providing optimized BLAS...
Intel Xeon Phi is a coprocessor with sixty-one cores in a single chip. The chip has a more powerful ...
The paper deals with parallel approach for the numerical solution of large, sparse, non-symmetric sy...
In this paper, we present the main algorithmic features in the software package SuperLU{_}DIST, a di...
This dissertation details contributions made by the author to the field of computer science while wo...
The aim of this paper is to evaluate the performance of existing parallel linear equation solvers to...
The ScaLAPACK library for parallel dense matrix computations is built on top of the BLACS communicat...
In this paper, we present the main algorithmic features in the software package SuperLU DIST, a dis...
International audienceSparse direct solvers is a time consuming operation required by many scientifi...
This paper provides a comprehensive study and comparison of two state-of-the-art direct solvers for ...
It is important to have a fast, robust and scalable algorithm to solve a sparse linear system AX=B. ...
International audienceSparse direct solvers is a time consuming operation required by many scientifi...
We present the runtime comparison of the two versions of Super LU{_}DIST, using up to 128 processors...
We give an overview of the algorithms, design philosophy, and implementation techniques in the soft...
We present the runtime comparison of the two versions of Super LU_DIST, using up to 128 processors ...
One of the key areas for enabling users to efficiently use an HPC system is providing optimized BLAS...
Intel Xeon Phi is a coprocessor with sixty-one cores in a single chip. The chip has a more powerful ...
The paper deals with parallel approach for the numerical solution of large, sparse, non-symmetric sy...
In this paper, we present the main algorithmic features in the software package SuperLU{_}DIST, a di...
This dissertation details contributions made by the author to the field of computer science while wo...
The aim of this paper is to evaluate the performance of existing parallel linear equation solvers to...
The ScaLAPACK library for parallel dense matrix computations is built on top of the BLACS communicat...