Portable parallel benchmarks are widely used for performance evaluation of HPC systems. However, because these are manually produced, they generally represent a greatly simplified view of application behavior, missing the subtle but important-to-performance nuances that may exist in a complete application. This work contributes novel methods to auto-matically generate highly portable and customizable communication benchmarks from HPC applications. We utilize Sca-laTrace, a lossless yet scalable parallel-application tracing framework to collect selected aspects of the run-time behavior of HPC applications, including communication operations and computation time, while abstracting away the details of the computation proper. We subsequently ge...
Our research project intends to build knowledge about HPC problems to be able to help local research...
Performance measurement and analysis of parallel applications is often challenging, despite many exc...
International audienceOverlapping communications with computation is an efficient way to amortize th...
Abstract—Benchmarks are essential for evaluating HPC hardware and software for petascale machines an...
HPC application developers encounter significant challenges getting their codes to run correctly on ...
LAGADAPATI, MAHESH. Benchmark Generation and Simulation at Extreme Scale. (Under the direction of Fr...
A considerably fraction of science discovery is nowadays relying on computer simulations. High Per...
The availability of cheap computers with outstanding single-processor performance coupled with Ether...
Moving data between processes has often been discussed as one of the major bottlenecks in parallel c...
Characterizing the communication behavior of large-scale applications is a difficult and costly task...
High-performance computing systems have become increasingly dynamic, complex, and unpredictable. To ...
Nowadays, the whole HPC community is looking forward to the exascale era, with computer and system a...
Performance modeling, the science of understanding and predicting application performance, is import...
Abstract. In this paper, we analyze existing MPI benchmarking suites, focusing on two restrictions t...
Modern HPC platforms are using multiple CPU, GPUs and high-performance interconnects per node. Unfor...
Our research project intends to build knowledge about HPC problems to be able to help local research...
Performance measurement and analysis of parallel applications is often challenging, despite many exc...
International audienceOverlapping communications with computation is an efficient way to amortize th...
Abstract—Benchmarks are essential for evaluating HPC hardware and software for petascale machines an...
HPC application developers encounter significant challenges getting their codes to run correctly on ...
LAGADAPATI, MAHESH. Benchmark Generation and Simulation at Extreme Scale. (Under the direction of Fr...
A considerably fraction of science discovery is nowadays relying on computer simulations. High Per...
The availability of cheap computers with outstanding single-processor performance coupled with Ether...
Moving data between processes has often been discussed as one of the major bottlenecks in parallel c...
Characterizing the communication behavior of large-scale applications is a difficult and costly task...
High-performance computing systems have become increasingly dynamic, complex, and unpredictable. To ...
Nowadays, the whole HPC community is looking forward to the exascale era, with computer and system a...
Performance modeling, the science of understanding and predicting application performance, is import...
Abstract. In this paper, we analyze existing MPI benchmarking suites, focusing on two restrictions t...
Modern HPC platforms are using multiple CPU, GPUs and high-performance interconnects per node. Unfor...
Our research project intends to build knowledge about HPC problems to be able to help local research...
Performance measurement and analysis of parallel applications is often challenging, despite many exc...
International audienceOverlapping communications with computation is an efficient way to amortize th...