Current high-performance multicore processors provide users with a non-uniform memory access model (NUMA). These systems perform better when threads access data on mem-ory banks next to the core where they run. However, en-suring data locality is dicult. In this paper, we propose compiler analyses and code generation methods to support a lightweight runtime system that dynamically migrates mem-ory pages to improve data locality. Our technique combines static and dynamic analyses and is capable of identifying the most promising pages to migrate. Statically, we infer the size of arrays, plus the amount of reuse of each memory access instruction in a program. These estimates rely on a simple, yet accurate, trip count predictor of our own desig...
this paper, we describe a framework for loop transformations and code generation for NUMA (non-unifo...
Memory bandwidth has become the performance bottleneck for memory intensive programs on modern proce...
Over the past decades, core speeds have been improving at a much higher rate than memory bandwidth. ...
A common feature of many scalable parallel machines is non-uniform memory access (NUMA) --- data acc...
As computing efficiency becomes constrained by hardware scaling limitations, code optimization grows...
Distributed memory parallel architectures support a memory model where some memory accesses are loca...
This paper compares data distribution methodologies for scaling the performance of OpenMP on NUMA ar...
Shared memory applications running transparently on top of NUMA architectures often face severe perf...
This paper introduces a dynamic layout optimization strategy to minimize the number of cycles spent ...
A multiprocessor system with uniform memory access is difficult to scale due to the increasing conte...
In this work we study the effect of data locality on the performance of Gaussian 03 code running on ...
Many applications are memory intensive and thus are bounded by memory latency and bandwidth. While i...
International audienceModern multicore systems are based on a Non-Uniform Memory Access (NUMA) desig...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
this paper, we describe a framework for loop transformations and code generation for NUMA (non-unifo...
Memory bandwidth has become the performance bottleneck for memory intensive programs on modern proce...
Over the past decades, core speeds have been improving at a much higher rate than memory bandwidth. ...
A common feature of many scalable parallel machines is non-uniform memory access (NUMA) --- data acc...
As computing efficiency becomes constrained by hardware scaling limitations, code optimization grows...
Distributed memory parallel architectures support a memory model where some memory accesses are loca...
This paper compares data distribution methodologies for scaling the performance of OpenMP on NUMA ar...
Shared memory applications running transparently on top of NUMA architectures often face severe perf...
This paper introduces a dynamic layout optimization strategy to minimize the number of cycles spent ...
A multiprocessor system with uniform memory access is difficult to scale due to the increasing conte...
In this work we study the effect of data locality on the performance of Gaussian 03 code running on ...
Many applications are memory intensive and thus are bounded by memory latency and bandwidth. While i...
International audienceModern multicore systems are based on a Non-Uniform Memory Access (NUMA) desig...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
this paper, we describe a framework for loop transformations and code generation for NUMA (non-unifo...
Memory bandwidth has become the performance bottleneck for memory intensive programs on modern proce...
Over the past decades, core speeds have been improving at a much higher rate than memory bandwidth. ...