Despite supercomputers deliver huge computational power, applications only reach a fraction of it. There are several factors limiting the application performance, and one of the most important is the single processor efficiency because it ultimately dictates the overall achieved performance. We present the folding mechanism, a process that combines measurements captured through minimal instrumentation and coarse-grain sampling ensuring low time dilation (less than 5%). The mechanism reports instantaneous performance and source-code references for optimized binaries accurately by taking advantage of the repetitiveness of many applications, especially in HPC. The mechanism enables the exploration of the application performance and ...
The growing gap between processor and memory speeds has lead to complex memory hierarchies as proces...
The correlation of performance bottlenecks and their associated source code has become a cornerstone...
The end of Dennard scaling also brought an end to frequency scaling as a means to improve performanc...
Nowadays, supercomputers deliver an enormous amount of computation power; however, it is well-known ...
As access to supercomputing resources is becoming more and more commonplace, performance analysis to...
On the road to Exascale computing, both performance and power areas are meant to be tackled at diffe...
Using an extremely large number of processing elements in computing systems leads to unexpected phen...
The results obtained from this project will fundamentally change the way we look at computer perform...
Supercomputers play a key role in countless areas of science and engineering, enabling the developme...
Measurements of actual supercomputer cache performance has not been previously undertaken. PFC-Sim i...
International audienceThe advent of multicore and manycore processors, including GPUs, in the custom...
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-96983-1_10Des...
The growing gap between processor and memory speeds results in complex memory hierarchies as process...
Parallelism is everywhere, with co-processors such as Graphics Processing Units (GPUs) accelerating ...
Nowadays, the whole HPC community is looking forward to the exascale era, with computer and system a...
The growing gap between processor and memory speeds has lead to complex memory hierarchies as proces...
The correlation of performance bottlenecks and their associated source code has become a cornerstone...
The end of Dennard scaling also brought an end to frequency scaling as a means to improve performanc...
Nowadays, supercomputers deliver an enormous amount of computation power; however, it is well-known ...
As access to supercomputing resources is becoming more and more commonplace, performance analysis to...
On the road to Exascale computing, both performance and power areas are meant to be tackled at diffe...
Using an extremely large number of processing elements in computing systems leads to unexpected phen...
The results obtained from this project will fundamentally change the way we look at computer perform...
Supercomputers play a key role in countless areas of science and engineering, enabling the developme...
Measurements of actual supercomputer cache performance has not been previously undertaken. PFC-Sim i...
International audienceThe advent of multicore and manycore processors, including GPUs, in the custom...
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-96983-1_10Des...
The growing gap between processor and memory speeds results in complex memory hierarchies as process...
Parallelism is everywhere, with co-processors such as Graphics Processing Units (GPUs) accelerating ...
Nowadays, the whole HPC community is looking forward to the exascale era, with computer and system a...
The growing gap between processor and memory speeds has lead to complex memory hierarchies as proces...
The correlation of performance bottlenecks and their associated source code has become a cornerstone...
The end of Dennard scaling also brought an end to frequency scaling as a means to improve performanc...