Abstract—This paper proposes a methodology to study the data reuse quality of task-parallel runtimes. We introduce an extension to the reuse distance method called the Kernel Reuse Distance (KRD). The metric is a low-overhead alternative designed to ana-lyze data reuse at the socket level while minimizing perturbation to the parallel schedule. Using the KRD metric we show that reuse depends considerably on the system configuration (sockets, cores) and on the runtime scheduler. Furthermore, we correlate KRD with hardware metrics such as cache misses and work time inflation. Overall we found that KRD can be used effectively to assess data reuse in parallel applications. The study also revealed that several current runtimes suffer from severe ...
Profiling can effectively analyze program behavior and provide critical information for feedback-dir...
Abstract. Profiling can effectively analyze program behavior and provide critical information for fe...
Locality increasingly determines system performance. As a rigor-ous and precise locality model, reus...
Abstract. This paper proposes a methodology to study the data reuse quality of task-parallel runtime...
The performance and energy efficiency of multicore systems are increasingly dominated by the costs o...
As multicore processors implementing shared-memory programming models have become commonplace, analy...
As computing efficiency becomes constrained by hardware scaling limitations, code optimization grows...
Cache is one of the most widely used components in today's computing systems. Its performance is hea...
Recent scheduling heuristics for task-based applications have managed to improve their by taking int...
The potential for improving the performance of data-intensive scientific programs by enhancing data ...
Locality, characterized by data reuses, determines caching performance. Reuse distance (i.e. LRU st...
Understanding multicore memory behavior is crucial, but can be challenging due to the complex cache ...
This thesis presents a methodology to automatically determine a data memory organisation at compilet...
Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak process...
This paper presents and validates methods to extend reuse distance analysis of application locality ...
Profiling can effectively analyze program behavior and provide critical information for feedback-dir...
Abstract. Profiling can effectively analyze program behavior and provide critical information for fe...
Locality increasingly determines system performance. As a rigor-ous and precise locality model, reus...
Abstract. This paper proposes a methodology to study the data reuse quality of task-parallel runtime...
The performance and energy efficiency of multicore systems are increasingly dominated by the costs o...
As multicore processors implementing shared-memory programming models have become commonplace, analy...
As computing efficiency becomes constrained by hardware scaling limitations, code optimization grows...
Cache is one of the most widely used components in today's computing systems. Its performance is hea...
Recent scheduling heuristics for task-based applications have managed to improve their by taking int...
The potential for improving the performance of data-intensive scientific programs by enhancing data ...
Locality, characterized by data reuses, determines caching performance. Reuse distance (i.e. LRU st...
Understanding multicore memory behavior is crucial, but can be challenging due to the complex cache ...
This thesis presents a methodology to automatically determine a data memory organisation at compilet...
Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak process...
This paper presents and validates methods to extend reuse distance analysis of application locality ...
Profiling can effectively analyze program behavior and provide critical information for feedback-dir...
Abstract. Profiling can effectively analyze program behavior and provide critical information for fe...
Locality increasingly determines system performance. As a rigor-ous and precise locality model, reus...