Boosting performance and energy efficiency of scientific applications running on high performance computing systems arise cruicially nowadays. Software and hardware based solutions for improving communication performance have been recognized as significant means of achieving performance gain and thus energy savings for such applications. As a fundamental component of most numerical linear algebra algorithms, improving performance and energy efficiency of distributed matrix multiplication is of major concerns. For such purposes, we propose a high performance communication scheme that fully exploits network bandwidth via non-blocking pipeline broadcast with tuned chunk size. Empirically, substantial performance gain up to 8.4% and energy savi...
We present a new fast and scalable matrix multiplication algorithm, called DIMMA (Distribution-Indep...
We consider unicast-based pipelined broadcast schemes for clusters connected by multiple Ethernet sw...
Parallel computing on networks of workstations are intensively used in some application areas such a...
Boosting performance and energy efficiency of scientific applications running on high performance co...
AbstractThe demands of improving energy efficiency for high performance scientific applications aris...
The demands of improving energy efficiency for high performance scientific applications arise crucia...
For the past decade, power/energy consumption has become a limiting factor for large-scale and embed...
A parallel matrix multiplication algorithm is presented, and studies of its performance and estimati...
International audienceGPU matrix chain multiplication serves as a basis for a wide range of scientif...
Matrix multiplication is taken as a test bed for parallel processing on heterogeneous networks of wo...
For the past decade, power/energy consumption has become a limiting factor for large-scale and embed...
Excessive energy consumption has become one of the major challenges in high performance computing. R...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...
Technology scaling trends have enabled the exponential growth of computing power. However, the perfo...
Dense linear algebra computations are essential to nearly every problem in scientific computing and ...
We present a new fast and scalable matrix multiplication algorithm, called DIMMA (Distribution-Indep...
We consider unicast-based pipelined broadcast schemes for clusters connected by multiple Ethernet sw...
Parallel computing on networks of workstations are intensively used in some application areas such a...
Boosting performance and energy efficiency of scientific applications running on high performance co...
AbstractThe demands of improving energy efficiency for high performance scientific applications aris...
The demands of improving energy efficiency for high performance scientific applications arise crucia...
For the past decade, power/energy consumption has become a limiting factor for large-scale and embed...
A parallel matrix multiplication algorithm is presented, and studies of its performance and estimati...
International audienceGPU matrix chain multiplication serves as a basis for a wide range of scientif...
Matrix multiplication is taken as a test bed for parallel processing on heterogeneous networks of wo...
For the past decade, power/energy consumption has become a limiting factor for large-scale and embed...
Excessive energy consumption has become one of the major challenges in high performance computing. R...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...
Technology scaling trends have enabled the exponential growth of computing power. However, the perfo...
Dense linear algebra computations are essential to nearly every problem in scientific computing and ...
We present a new fast and scalable matrix multiplication algorithm, called DIMMA (Distribution-Indep...
We consider unicast-based pipelined broadcast schemes for clusters connected by multiple Ethernet sw...
Parallel computing on networks of workstations are intensively used in some application areas such a...