HPC applications usually run at a low fraction of the computer's peak performance. Empirical performance modeling is a helpful tool for automatically assessing the scaling behavior of applications, thereby finding bottlenecks and facilitating the process of improving an application's performance. Current tools for performance modeling neglect the cache behavior of applications, although it plays a significant role for overall performance due to the increasing gap between memory and processor speed. In this thesis, by creating an interface between ThreadSpotter, an open source memory sampler, and Extra-P, a tool for performance modeling, we present and evaluate a methodology to model how scaling affects an application's memory access localit...
Poor data locality is a performance bottleneck in modern applications. The hierarchy of caches exiti...
Improving cache performance requires understanding cache behavior. However, measuring cache performa...
With the increasing gap between the speeds of the processor and memory system, memory access has bec...
HPC applications usually run at a low fraction of the computer's peak performance. Empirical perform...
There is an ever widening performance gap between processors and main memory, a gap bridged by small...
International audienceThe increasing computation capability of servers comes with a dramatic increas...
AbstractSparse scientific codes face grave performance challenges as memory bandwidth limitations gr...
Thesis (Ph. D.)--University of Rochester. Department of Computer Science, 2017On modern processors, ...
As modern GPUs rely partly on their on-chip memories to counter the imminent off-chip memory wall, t...
Improving cache performance requires understanding cache behavior. However, measuring cache performa...
The growing gap between processor and memory speeds results in complex memory hierarchies as process...
Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016...
This paper presents ESTIMA, an easy-to-use tool for extrapolating the scalability of in-memory appli...
With contemporary research focusing its attention primarily on benchmark-driven performance evaluati...
Data locality is central to modern computer designs. The widening gap between processor speed and me...
Poor data locality is a performance bottleneck in modern applications. The hierarchy of caches exiti...
Improving cache performance requires understanding cache behavior. However, measuring cache performa...
With the increasing gap between the speeds of the processor and memory system, memory access has bec...
HPC applications usually run at a low fraction of the computer's peak performance. Empirical perform...
There is an ever widening performance gap between processors and main memory, a gap bridged by small...
International audienceThe increasing computation capability of servers comes with a dramatic increas...
AbstractSparse scientific codes face grave performance challenges as memory bandwidth limitations gr...
Thesis (Ph. D.)--University of Rochester. Department of Computer Science, 2017On modern processors, ...
As modern GPUs rely partly on their on-chip memories to counter the imminent off-chip memory wall, t...
Improving cache performance requires understanding cache behavior. However, measuring cache performa...
The growing gap between processor and memory speeds results in complex memory hierarchies as process...
Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016...
This paper presents ESTIMA, an easy-to-use tool for extrapolating the scalability of in-memory appli...
With contemporary research focusing its attention primarily on benchmark-driven performance evaluati...
Data locality is central to modern computer designs. The widening gap between processor speed and me...
Poor data locality is a performance bottleneck in modern applications. The hierarchy of caches exiti...
Improving cache performance requires understanding cache behavior. However, measuring cache performa...
With the increasing gap between the speeds of the processor and memory system, memory access has bec...