International audienceWe study the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear systems using an LU factorization algorithm. In particular we illustrate how an appropriate placement of the threads and memory on a NUMA architecture can improve the performance of the panel factorization and consequently accelerate the global LU factorization. We apply these placement strategies and present performance results for a hybrid multicore/GPU LU algorithm as it is implemented in the public domain library MAGMA
International audienceWe present a fast hybrid solver for dense linear systems based on LU factoriza...
Abstract—LU factorization with partial pivoting is a canonical numerical procedure and the main comp...
Parallelizing the LU factorization of sparse Jacobian matrices reduces the execution time of the pow...
International audienceWe study the impact of non-uniform memory accesses (NUMA) on the solution of d...
We study the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear sy...
In this PhD thesis, we study algorithms and implementations to accelerate the solution of dense line...
We discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-...
Abstract. We address some key issues in designing dense linear alge-bra (DLA) algorithms that are co...
AbstractLU factorization is the most computationally intensive step in solving systems of linear equ...
We address some key issues in designing dense linear algebra (DLA) algorithms that are common for bo...
The sparse matrix solver is a critical component in circuit simulators. Some researches have develop...
Abstract—Multicore architectures enhanced with multiple GPUs are likely to become mainstream High Pe...
Multicore multiprocessors use Non Uniform Memory Ar-chitecture (NUMA) to improve their scalability. ...
Sparse solver has become the bottleneck of SPICE simulators. There has been few work on GPU-based sp...
We present an out-of-core sparse nonsymmetric LU-factorization algorithm with partial pivoting. We h...
International audienceWe present a fast hybrid solver for dense linear systems based on LU factoriza...
Abstract—LU factorization with partial pivoting is a canonical numerical procedure and the main comp...
Parallelizing the LU factorization of sparse Jacobian matrices reduces the execution time of the pow...
International audienceWe study the impact of non-uniform memory accesses (NUMA) on the solution of d...
We study the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear sy...
In this PhD thesis, we study algorithms and implementations to accelerate the solution of dense line...
We discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-...
Abstract. We address some key issues in designing dense linear alge-bra (DLA) algorithms that are co...
AbstractLU factorization is the most computationally intensive step in solving systems of linear equ...
We address some key issues in designing dense linear algebra (DLA) algorithms that are common for bo...
The sparse matrix solver is a critical component in circuit simulators. Some researches have develop...
Abstract—Multicore architectures enhanced with multiple GPUs are likely to become mainstream High Pe...
Multicore multiprocessors use Non Uniform Memory Ar-chitecture (NUMA) to improve their scalability. ...
Sparse solver has become the bottleneck of SPICE simulators. There has been few work on GPU-based sp...
We present an out-of-core sparse nonsymmetric LU-factorization algorithm with partial pivoting. We h...
International audienceWe present a fast hybrid solver for dense linear systems based on LU factoriza...
Abstract—LU factorization with partial pivoting is a canonical numerical procedure and the main comp...
Parallelizing the LU factorization of sparse Jacobian matrices reduces the execution time of the pow...