The paper describes an efficient direct method to solve an equation Ax = b, where A is a sparse matrix, on the Intel ® Xeon PhiTM coprocessor. The main challenge for such a system is how to en-gage all available threads (about 240) and how to reduce OpenMP * synchronization overhead, which is very expensive for hundreds of threads. The method consists of decomposing A into a product of lower-triangular, diagonal, and upper triangular matrices followed by solves of the re-sulting three subsystems. The main idea is based on the hybrid parallel algorithm used in the In-tel ® Math Kernel Library Parallel Direct Sparse Solver for Clusters [1]. Our implementation ex-ploits a static scheduling algorithm during the factorization step to reduce Open...
Problems in the class of unstructured sparse matrix computations are characterized by highly irregul...
Sparse-matrix solution is a dominant part of execution time in simulating VLSI circuits by a detaile...
As part of our effort to parallelise SPICE simulations over multiple FPGAs, we present a parallel FP...
It is important to have a fast, robust and scalable algorithm to solve a sparse linear system AX=B. ...
Abstract. The last decade has seen rapid growth of single-chip multi-processors (CMPs), which have b...
Abstract. Intel Xeon Phi is a recently released high-performance co-processor which features 61 core...
In this paper, we propose a lightweight optimization methodology for the ubiquitous sparse matrix-ve...
Recently, the Intel Xeon Phi coprocessor has received increasing attention in high performance compu...
International audienceSparse direct solvers is a time consuming operation required by many scientifi...
Accelerators such as the Graphic Processing Unit (GPU) have increasingly seen use by the science and...
Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in are...
Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in are...
In this whitepaper, we propose outer-product-parallel and inner-product-parallel sparse matrix-matri...
This paper describes an approach for acceleration of the Hybrid Total FETI (HTFETI) domain decomposi...
International audienceSparse direct solvers is a time consuming operation required by many scientifi...
Problems in the class of unstructured sparse matrix computations are characterized by highly irregul...
Sparse-matrix solution is a dominant part of execution time in simulating VLSI circuits by a detaile...
As part of our effort to parallelise SPICE simulations over multiple FPGAs, we present a parallel FP...
It is important to have a fast, robust and scalable algorithm to solve a sparse linear system AX=B. ...
Abstract. The last decade has seen rapid growth of single-chip multi-processors (CMPs), which have b...
Abstract. Intel Xeon Phi is a recently released high-performance co-processor which features 61 core...
In this paper, we propose a lightweight optimization methodology for the ubiquitous sparse matrix-ve...
Recently, the Intel Xeon Phi coprocessor has received increasing attention in high performance compu...
International audienceSparse direct solvers is a time consuming operation required by many scientifi...
Accelerators such as the Graphic Processing Unit (GPU) have increasingly seen use by the science and...
Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in are...
Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in are...
In this whitepaper, we propose outer-product-parallel and inner-product-parallel sparse matrix-matri...
This paper describes an approach for acceleration of the Hybrid Total FETI (HTFETI) domain decomposi...
International audienceSparse direct solvers is a time consuming operation required by many scientifi...
Problems in the class of unstructured sparse matrix computations are characterized by highly irregul...
Sparse-matrix solution is a dominant part of execution time in simulating VLSI circuits by a detaile...
As part of our effort to parallelise SPICE simulations over multiple FPGAs, we present a parallel FP...