As computing efficiency becomes constrained by hardware scaling limitations, code optimization grows increasingly important as an area of research. The impact of certain optimizations depends on whether a program is compute-bound or memory-bound. Memory-bound computations especially benefit from program transformations that improve their data locality, to better exploit modern memory hierarchies. Reuse distance is a useful measure for analyzing data locality in an architecture-agnostic way, i.e., independent of specific cache sizes. Previous work has researched different ways to calculate reuse distance, ranging from deterministic to probabilistic and using different definitions of reuse distance. This thesis investigates the use of stat...
Program redundancy analysis and optimization have been an important component in optimizing compiler...
Data locality is central to modern computer designs. The widening gap between processor speed and me...
Abstract—This paper proposes a methodology to study the data reuse quality of task-parallel runtimes...
As multicore processors implementing shared-memory programming models have become commonplace, analy...
Cache is one of the most widely used components in today's computing systems. Its performance is hea...
The performance and energy efficiency of multicore systems are increasingly dominated by the costs o...
Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak process...
Abstract. Profiling can effectively analyze program behavior and provide critical information for fe...
The potential for improving the performance of data-intensive scientific programs by enhancing data ...
Feedback-directed optimization has become an increasingly important tool in designing and building o...
Feedback-directed optimization has become an increasingly impor-tant tool in designing and building ...
Profiling can effectively analyze program behavior and provide critical information for feedback-dir...
Due to the huge speed gaps in the memory hierarchy of modern computer architectures, it is important...
Understanding multicore memory behavior is crucial, but can be challenging due to the complex cache ...
International audienceEmerging computer architectures will feature drastically decreased flops/byte ...
Program redundancy analysis and optimization have been an important component in optimizing compiler...
Data locality is central to modern computer designs. The widening gap between processor speed and me...
Abstract—This paper proposes a methodology to study the data reuse quality of task-parallel runtimes...
As multicore processors implementing shared-memory programming models have become commonplace, analy...
Cache is one of the most widely used components in today's computing systems. Its performance is hea...
The performance and energy efficiency of multicore systems are increasingly dominated by the costs o...
Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak process...
Abstract. Profiling can effectively analyze program behavior and provide critical information for fe...
The potential for improving the performance of data-intensive scientific programs by enhancing data ...
Feedback-directed optimization has become an increasingly important tool in designing and building o...
Feedback-directed optimization has become an increasingly impor-tant tool in designing and building ...
Profiling can effectively analyze program behavior and provide critical information for feedback-dir...
Due to the huge speed gaps in the memory hierarchy of modern computer architectures, it is important...
Understanding multicore memory behavior is crucial, but can be challenging due to the complex cache ...
International audienceEmerging computer architectures will feature drastically decreased flops/byte ...
Program redundancy analysis and optimization have been an important component in optimizing compiler...
Data locality is central to modern computer designs. The widening gap between processor speed and me...
Abstract—This paper proposes a methodology to study the data reuse quality of task-parallel runtimes...