A current trend in high-performance computing is to decompose a large linear algebra prob- lem into batches containing thousands of smaller problems, that can be solved independently, before collating the results. To standardize the interface to these routines, the community is developing an extension to the BLAS standard (the batched BLAS), enabling users to perform thousands of small BLAS operations in parallel whilst making efficient use of their hardware. We discuss the benefits and drawbacks of the current batched BLAS proposals and perform a number of experiments, focusing on GEMM, to explore their affect on the performance. In particular we analyze the effect of novel data layouts which, for example, interleave the ma- trices in memo...
The increase in performance of the last generations of graphics processors (GPUs) has made this clas...
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building bloc...
Abstract The Basic Linear Algebra Subprograms, BLAS, are the basic computa-tional kernels in most ap...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
One trend in modern high performance computing (HPC) is to decompose a large linear algebra problem ...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
A challenging class of problems arising in many GPU applications, called batched problems, involves ...
This paper proposes an API for Batched Basic Linear Algebra Subprograms (Batched BLAS). We focus on...
One of the key areas for enabling users to efficiently use an HPC system is providing optimized BLAS...
International audienceSparse direct solvers is a time consuming operation required by many scientifi...
The emergence of multicore and heterogeneous architectures requires many linear algebra algorithms t...
International audienceSparse direct solvers is a time consuming operation required by many scientifi...
International audienceIn the last ten years, GPUs have dominated the market considering the computin...
This report summarises the main points raised on a recent workshop discussing various extensions to ...
General matrix-matrix multiplications with double-precision real and complex entries (DGEMM and ZGEM...
The increase in performance of the last generations of graphics processors (GPUs) has made this clas...
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building bloc...
Abstract The Basic Linear Algebra Subprograms, BLAS, are the basic computa-tional kernels in most ap...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
One trend in modern high performance computing (HPC) is to decompose a large linear algebra problem ...
We provide timing results for common linear algebra subroutines across BLAS (Basic Lin-ear Algebra S...
A challenging class of problems arising in many GPU applications, called batched problems, involves ...
This paper proposes an API for Batched Basic Linear Algebra Subprograms (Batched BLAS). We focus on...
One of the key areas for enabling users to efficiently use an HPC system is providing optimized BLAS...
International audienceSparse direct solvers is a time consuming operation required by many scientifi...
The emergence of multicore and heterogeneous architectures requires many linear algebra algorithms t...
International audienceSparse direct solvers is a time consuming operation required by many scientifi...
International audienceIn the last ten years, GPUs have dominated the market considering the computin...
This report summarises the main points raised on a recent workshop discussing various extensions to ...
General matrix-matrix multiplications with double-precision real and complex entries (DGEMM and ZGEM...
The increase in performance of the last generations of graphics processors (GPUs) has made this clas...
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building bloc...
Abstract The Basic Linear Algebra Subprograms, BLAS, are the basic computa-tional kernels in most ap...