For the past decade, power/energy consumption has become a limiting factor for large-scale and embedded High Performance Computing (HPC) systems. This is especially true for systems that include accelerators, e.g., high-end computing devices, such as Graphics Processing Units (GPUs), with terascale computing capabilities and high power draws that greatly surpass that of multi-core CPUs. Accordingly, improving the node-level power/energy efficiency of an application can have a direct and positive impact on both classes of HPC systems. The research reported in this thesis explores the use of software techniques to enhance the execution-time and power-consumption performance of applications executed on a CPU/GPGPU compute node. We conducted th...
ABSTRACT: In this paper, we have proposed one designs for matrix-matrix multiplication. The one desi...
Excessive energy consumption has become one of the major challenges in high performance computing. R...
General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized fo...
For the past decade, power/energy consumption has become a limiting factor for large-scale and embed...
Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016...
Abstract—Energy efficiency has emerged as one of the key performance metrics in computing. In this w...
International audienceGPU matrix chain multiplication serves as a basis for a wide range of scientif...
Sparse matrix-matrix multiplication (SpMM) is a key operation in numerous ar- eas from information ...
As users and developers, we are witnessing the opening of a new computing scenario: the introduction...
Matrix multiplication is at the core of high-performance numerical computation. Software methods of ...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...
The performance of a parallel matrix-matrix-multiplication routine with the same functionality as DG...
The multiplication of large spare matrices is a basic operation for many scientific and engineering ...
In this thesis, the performance and energy efficiency of four different implementations of matrix mu...
Graphic processors are becoming faster and faster. Computational power within graphic processing uni...
ABSTRACT: In this paper, we have proposed one designs for matrix-matrix multiplication. The one desi...
Excessive energy consumption has become one of the major challenges in high performance computing. R...
General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized fo...
For the past decade, power/energy consumption has become a limiting factor for large-scale and embed...
Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016...
Abstract—Energy efficiency has emerged as one of the key performance metrics in computing. In this w...
International audienceGPU matrix chain multiplication serves as a basis for a wide range of scientif...
Sparse matrix-matrix multiplication (SpMM) is a key operation in numerous ar- eas from information ...
As users and developers, we are witnessing the opening of a new computing scenario: the introduction...
Matrix multiplication is at the core of high-performance numerical computation. Software methods of ...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...
The performance of a parallel matrix-matrix-multiplication routine with the same functionality as DG...
The multiplication of large spare matrices is a basic operation for many scientific and engineering ...
In this thesis, the performance and energy efficiency of four different implementations of matrix mu...
Graphic processors are becoming faster and faster. Computational power within graphic processing uni...
ABSTRACT: In this paper, we have proposed one designs for matrix-matrix multiplication. The one desi...
Excessive energy consumption has become one of the major challenges in high performance computing. R...
General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized fo...