Characterizing the communication behavior of large-scale applications is a difficult and costly task due to code/system complexity and their long execution times. An alternative to running actual codes is to gather their communication traces and then replay them, which facilitates application tuning and future procurements. While past approaches lacked lossless scalable trace collection, we contribute an approach that provides orders of magnitude smaller, if not near constant-size, communication traces regardless of the number of nodes while preserving structural information. We introduce intra- and inter-node compression techniques of MPI events, we develop a scheme to preserve time and causality of communication events, and we present res...
//TRACE1 is a new approach for extracting and replaying traces of parallel applications to recreate ...
Abstract — Trace-driven simulation has long been used in both processor and memory studies. The larg...
Concurrency levels in large-scale supercomputers are rising exponentially, and shared-memory nodes w...
Characterizing the communication behavior of largescale applications is a difficult and costly task ...
Abstract—Benchmarks are essential for evaluating HPC hardware and software for petascale machines an...
Event tracing of applications under dynamic execution is crucial for performance modeling, optimizat...
Portable parallel benchmarks are widely used for performance evaluation of HPC systems. However, bec...
High Performance Computing (HPC) systems play an important role in today’s heavily digitized world, ...
International audienceThe off-line (or post-mortem) analysis of execution event traces is a popular ...
The performance of massively parallel program is often impacted by the cost of communication across ...
Performance analysis is an essential part of the development process of HPC applications. Thus, deve...
A considerably fraction of science discovery is nowadays relying on computer simulations. High Per...
Abstract. This paper presents a preliminary evaluation of TraceR, a trace replay tool built upon the...
This paper presents an optimization of MPI communication, called Adaptive-CoMPI, based on runtime co...
Applications must scale well to make efficient use of even medium-scale parallel systems. Because sc...
//TRACE1 is a new approach for extracting and replaying traces of parallel applications to recreate ...
Abstract — Trace-driven simulation has long been used in both processor and memory studies. The larg...
Concurrency levels in large-scale supercomputers are rising exponentially, and shared-memory nodes w...
Characterizing the communication behavior of largescale applications is a difficult and costly task ...
Abstract—Benchmarks are essential for evaluating HPC hardware and software for petascale machines an...
Event tracing of applications under dynamic execution is crucial for performance modeling, optimizat...
Portable parallel benchmarks are widely used for performance evaluation of HPC systems. However, bec...
High Performance Computing (HPC) systems play an important role in today’s heavily digitized world, ...
International audienceThe off-line (or post-mortem) analysis of execution event traces is a popular ...
The performance of massively parallel program is often impacted by the cost of communication across ...
Performance analysis is an essential part of the development process of HPC applications. Thus, deve...
A considerably fraction of science discovery is nowadays relying on computer simulations. High Per...
Abstract. This paper presents a preliminary evaluation of TraceR, a trace replay tool built upon the...
This paper presents an optimization of MPI communication, called Adaptive-CoMPI, based on runtime co...
Applications must scale well to make efficient use of even medium-scale parallel systems. Because sc...
//TRACE1 is a new approach for extracting and replaying traces of parallel applications to recreate ...
Abstract — Trace-driven simulation has long been used in both processor and memory studies. The larg...
Concurrency levels in large-scale supercomputers are rising exponentially, and shared-memory nodes w...