Abstract. This paper proposes a methodology to study the data reuse quality of task-parallel runtimes. We introduce an coarse-grain version of the reuse distance method called Kernel Reuse Distance (KRD). The metric is a low-overhead alternative designed to analyze data reuse at the socket level while minimizing perturbation to the parallel schedule. Using the KRD metric we show that reuse depends considerably on the system configuration (sockets, cores) and on the runtime scheduler. Furthermore, we correlate KRD with hardware metrics such as cache misses and work time inflation. Overall we found that KRD can be used effectively to assess data reuse in parallel applications. The study also revealed that several current runtimes suffer from ...
This thesis presents a methodology to automatically determine a data memory organisation at compilet...
Locality increasingly determines system performance. As a rigor-ous and precise locality model, reus...
Recent scheduling heuristics for task-based applications have managed to improve their by taking int...
Abstract—This paper proposes a methodology to study the data reuse quality of task-parallel runtimes...
The performance and energy efficiency of multicore systems are increasingly dominated by the costs o...
As multicore processors implementing shared-memory programming models have become commonplace, analy...
As computing efficiency becomes constrained by hardware scaling limitations, code optimization grows...
Cache is one of the most widely used components in today's computing systems. Its performance is hea...
The potential for improving the performance of data-intensive scientific programs by enhancing data ...
Locality, characterized by data reuses, determines caching performance. Reuse distance (i.e. LRU st...
Understanding multicore memory behavior is crucial, but can be challenging due to the complex cache ...
Profiling can effectively analyze program behavior and provide critical information for feedback-dir...
This paper presents and validates methods to extend reuse distance analysis of application locality ...
Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak process...
Abstract. Profiling can effectively analyze program behavior and provide critical information for fe...
This thesis presents a methodology to automatically determine a data memory organisation at compilet...
Locality increasingly determines system performance. As a rigor-ous and precise locality model, reus...
Recent scheduling heuristics for task-based applications have managed to improve their by taking int...
Abstract—This paper proposes a methodology to study the data reuse quality of task-parallel runtimes...
The performance and energy efficiency of multicore systems are increasingly dominated by the costs o...
As multicore processors implementing shared-memory programming models have become commonplace, analy...
As computing efficiency becomes constrained by hardware scaling limitations, code optimization grows...
Cache is one of the most widely used components in today's computing systems. Its performance is hea...
The potential for improving the performance of data-intensive scientific programs by enhancing data ...
Locality, characterized by data reuses, determines caching performance. Reuse distance (i.e. LRU st...
Understanding multicore memory behavior is crucial, but can be challenging due to the complex cache ...
Profiling can effectively analyze program behavior and provide critical information for feedback-dir...
This paper presents and validates methods to extend reuse distance analysis of application locality ...
Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak process...
Abstract. Profiling can effectively analyze program behavior and provide critical information for fe...
This thesis presents a methodology to automatically determine a data memory organisation at compilet...
Locality increasingly determines system performance. As a rigor-ous and precise locality model, reus...
Recent scheduling heuristics for task-based applications have managed to improve their by taking int...