The high performance computing (HPC) community is obsessed over the general matrix-matrix multiply (GEMM) routine. This obsession is not without reason. Most, if not all, Level 3 Basic Linear Algebra Subroutines (BLAS) can be written in terms of GEMM, and many of the higher level linear algebra solvers\u27 (i.e., LU, Cholesky) performance depend on GEMM\u27s performance. Getting high performance on GEMM is highly architecture dependent, and so for each new architecture that comes out, GEMM has to be programmed and tested to achieve maximal performance. Also, with emergent computer architectures featuring more vector-based and multi to many-core processors, GEMM performance becomes hinged to the utilization of these technologies. In this res...
Recent years have witnessed a tremendous surge of interest in accelerating sparse linear algebra app...
Using super-resolution techniques to estimate the direction that a signal arrived at a radio receive...
The emergence of multicore and heterogeneous architectures requires many linear algebra algorithms t...
The high performance computing (HPC) community is obsessed over the general matrix-matrix multiply (...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
The objective of high performance computing (HPC) is to ensure that the computational power of hardw...
Achieving high computation efficiency, in terms of Cycles per Instruction (CPI), for high-performanc...
In a previous PPoPP paper we showed how the FLAME method-ology, combined with the SuperMatrix runtim...
textIn the past, we could rely on technology scaling and new micro-architectural techniques to impro...
The trend of using co-processors as accelerators to perform certain tasks is rising in the parallel...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
The dissemination of multi-core architectures and the later irruption of massively parallel devices,...
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using Ope...
This dissertation presents an architecture to accelerate sparse matrix linear algebra,which is among...
Dense linear algebra(DLA) is one of the most seven important kernels in high performance computing. ...
Recent years have witnessed a tremendous surge of interest in accelerating sparse linear algebra app...
Using super-resolution techniques to estimate the direction that a signal arrived at a radio receive...
The emergence of multicore and heterogeneous architectures requires many linear algebra algorithms t...
The high performance computing (HPC) community is obsessed over the general matrix-matrix multiply (...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
The objective of high performance computing (HPC) is to ensure that the computational power of hardw...
Achieving high computation efficiency, in terms of Cycles per Instruction (CPI), for high-performanc...
In a previous PPoPP paper we showed how the FLAME method-ology, combined with the SuperMatrix runtim...
textIn the past, we could rely on technology scaling and new micro-architectural techniques to impro...
The trend of using co-processors as accelerators to perform certain tasks is rising in the parallel...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
The dissemination of multi-core architectures and the later irruption of massively parallel devices,...
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using Ope...
This dissertation presents an architecture to accelerate sparse matrix linear algebra,which is among...
Dense linear algebra(DLA) is one of the most seven important kernels in high performance computing. ...
Recent years have witnessed a tremendous surge of interest in accelerating sparse linear algebra app...
Using super-resolution techniques to estimate the direction that a signal arrived at a radio receive...
The emergence of multicore and heterogeneous architectures requires many linear algebra algorithms t...