The general matrix-matrix multiplication (GEMM) kernel is a fundamental building block of many scientific applications. Many libraries such as Intel MKL and BLIS provide highly optimized sequential and parallel versions of this kernel. The parallel implementations of the GEMM kernel rely on the well-known fork-join execution model to exploit multi-core systems efficiently. However, these implementations are not well suited for task-based applications as they break the data-flow execution model. In this paper, we present a task-based implementation of the GEMM kernel that can be seamlessly leveraged by task-based applications while providing better performance than the fork-join version. Our implementation leverages several advanced features...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
We take advantage of the new tasking features in OpenMP to propose advanced task-parallel algorithms...
OpenMP tasking supports parallelization of irregular algorithms. Recent OpenMP specifications extend...
The general matrix-matrix multiplication (GEMM) kernel is a fundamental building block of many scien...
The high performance computing (HPC) community is obsessed over the general matrix-matrix multiply (...
As chip multi-processors (CMPs) are becoming more and more complex, software solutions such as paral...
We extend a two-level task partitioning previously applied to the inversion of dense matrices via Ga...
International audienceTask-based programming models have succeeded in gaining the interest of the hi...
General Matrix Multiplication or GEMM kernels take centre place in high performance computing and ma...
International audienceComputing platforms are now extremely complex providing an increasing number o...
The emergence of new manycore architectures, such as the Intel Xeon Phi, poses new challenges in how...
BLIS is a new framework for rapid instantiation of the BLAS. We describe how BLIS extends the “GotoB...
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using Ope...
Along with the popularity of multicore and manycore, task-based dataflow programming models obtain g...
Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in are...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
We take advantage of the new tasking features in OpenMP to propose advanced task-parallel algorithms...
OpenMP tasking supports parallelization of irregular algorithms. Recent OpenMP specifications extend...
The general matrix-matrix multiplication (GEMM) kernel is a fundamental building block of many scien...
The high performance computing (HPC) community is obsessed over the general matrix-matrix multiply (...
As chip multi-processors (CMPs) are becoming more and more complex, software solutions such as paral...
We extend a two-level task partitioning previously applied to the inversion of dense matrices via Ga...
International audienceTask-based programming models have succeeded in gaining the interest of the hi...
General Matrix Multiplication or GEMM kernels take centre place in high performance computing and ma...
International audienceComputing platforms are now extremely complex providing an increasing number o...
The emergence of new manycore architectures, such as the Intel Xeon Phi, poses new challenges in how...
BLIS is a new framework for rapid instantiation of the BLAS. We describe how BLIS extends the “GotoB...
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using Ope...
Along with the popularity of multicore and manycore, task-based dataflow programming models obtain g...
Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in are...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
We take advantage of the new tasking features in OpenMP to propose advanced task-parallel algorithms...
OpenMP tasking supports parallelization of irregular algorithms. Recent OpenMP specifications extend...