International audienceWe consider techniques to improve the performance of parallel sparse triangular solution on non-uniform memory architecture multicores by extending earlier coloring and level set schemes for single-core multiprocessors. We develop STS-k, where k represents a small number of transformations for la-tency reduction from increased spatial and temporal locality of data accesses. We propose a graph model of data reuse to inform the development of STS-k and to prove that computing an optimal cost schedule is NP-complete. We observe significant speed-ups with STS-3 on 32-core Intel Westmere-Ex and 24-core AMD 'MagnyCours' processors. Incremental gains solely from the 3-level transformations in STS-3 for a fixed ordering, corre...
We discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-...
In this whitepaper, we propose outer-product-parallel and inner-product-parallel sparse matrix-matri...
Parallel computing promises several orders of magnitude increase in our ability to solve realistic c...
International audienceWe consider techniques to improve the performance of parallel sparse triangula...
Sparse triangular solve (SpTRSV) is one of the most important kernels in many real-world application...
International audienceIn this paper, we present a fine-grained multi-stage metric-based triangular r...
Abstract. The last decade has seen rapid growth of single-chip multi-processors (CMPs), which have b...
We propose a parallel sparse triangular linear system solver based on the Spike algorithm. Sparse tr...
Abstract. The Chip Multiprocessor (CMP) will be the basic build-ing block for computer systems rangi...
Sparse triangular solve (SpTRSV) is an extensively studied computational kernel. An important obstac...
While the growing number of cores per chip allows researchers to solve larger scientific and enginee...
[[abstract]]A fast parallel algorithm, which is generalized from the parallel algorithms for solving...
A few parallel algorithms for solving triangular systems resulting from parallel factorization of sp...
We present specialized implementations of the preconditioned iterative linear system solver in ILUP...
International audienceOver the past few years, parallel sparse direct solvers made significant progr...
We discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-...
In this whitepaper, we propose outer-product-parallel and inner-product-parallel sparse matrix-matri...
Parallel computing promises several orders of magnitude increase in our ability to solve realistic c...
International audienceWe consider techniques to improve the performance of parallel sparse triangula...
Sparse triangular solve (SpTRSV) is one of the most important kernels in many real-world application...
International audienceIn this paper, we present a fine-grained multi-stage metric-based triangular r...
Abstract. The last decade has seen rapid growth of single-chip multi-processors (CMPs), which have b...
We propose a parallel sparse triangular linear system solver based on the Spike algorithm. Sparse tr...
Abstract. The Chip Multiprocessor (CMP) will be the basic build-ing block for computer systems rangi...
Sparse triangular solve (SpTRSV) is an extensively studied computational kernel. An important obstac...
While the growing number of cores per chip allows researchers to solve larger scientific and enginee...
[[abstract]]A fast parallel algorithm, which is generalized from the parallel algorithms for solving...
A few parallel algorithms for solving triangular systems resulting from parallel factorization of sp...
We present specialized implementations of the preconditioned iterative linear system solver in ILUP...
International audienceOver the past few years, parallel sparse direct solvers made significant progr...
We discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-...
In this whitepaper, we propose outer-product-parallel and inner-product-parallel sparse matrix-matri...
Parallel computing promises several orders of magnitude increase in our ability to solve realistic c...