This paper explores an important behavior of memory access instructions, called access region locality. Unlike the traditional temporal and spatial data loacality that focuses on individual memory locations and how accesses to the locations are inter-related, the access region locality concerns with each static memory instruction and its range of access locations at run time. We consider program's data, heap, and stack regions in this paper. Our experimental study using a set of SPEC95 benchmark programs show that most memory reference instructions access a single region at run time. Also shown is that it is possible to predict the access region of a memory instruction accurately at run time by scrutinizing the addressing mode of the in...
The exploitation of locality of reference in shared memory multiprocessors is one of the most import...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak process...
Over the past decades, core speeds have been improving at a much higher rate than memory bandwidth. ...
This work explores the tradeoffs of the memory system of a new massively parallel multiprocessor in ...
Providing adequate data bandwidth is extremely important for a wide-issue superscalar processor to a...
Data locality is central to modern computer designs. The widening gap between processor speed and me...
Increasing the locality of a memory access profile is an interesting optimization problem, whose sol...
Cache memories were incorporated in microprocessors in the early times and represent the most common...
Memory system efficiency is crucial for any processor to achieve high performance, especially in the...
Highly aggressive multi-issue processor designs of the past few years and projections for the next d...
Many parallel systems offer a simple view of memory: all storage cells are addressed uniformly. Desp...
The performance of cache memories relies on the locality exhibited by programs. Traditionally this l...
Several benchmarks for measuring memory performance of HPC systems along dimensions of spatial and t...
Processor performance is directly impacted by the latency of the memory system. As processor core cy...
The exploitation of locality of reference in shared memory multiprocessors is one of the most import...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak process...
Over the past decades, core speeds have been improving at a much higher rate than memory bandwidth. ...
This work explores the tradeoffs of the memory system of a new massively parallel multiprocessor in ...
Providing adequate data bandwidth is extremely important for a wide-issue superscalar processor to a...
Data locality is central to modern computer designs. The widening gap between processor speed and me...
Increasing the locality of a memory access profile is an interesting optimization problem, whose sol...
Cache memories were incorporated in microprocessors in the early times and represent the most common...
Memory system efficiency is crucial for any processor to achieve high performance, especially in the...
Highly aggressive multi-issue processor designs of the past few years and projections for the next d...
Many parallel systems offer a simple view of memory: all storage cells are addressed uniformly. Desp...
The performance of cache memories relies on the locality exhibited by programs. Traditionally this l...
Several benchmarks for measuring memory performance of HPC systems along dimensions of spatial and t...
Processor performance is directly impacted by the latency of the memory system. As processor core cy...
The exploitation of locality of reference in shared memory multiprocessors is one of the most import...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak process...