Applications often under-utilize cache space and there are no software locality optimization techniques available for non-scientific applications. We propose that data redistribution in memory be used to modify reference patterns to improve locality of references. To understand the potential of such an approach and to explain where gains come from, we introduce distribution misses, and define a correlation metric to evaluate spatial locality. Data distribution can help reduce capacity and conflict misses in regular caches, as our experimental results to show. We use as example a profile-based scalar data layout heuristic, which was able to remove up to 76% of the direct-mapped cache miss ratio on some benchmark traces
The potential for improving the performance of data-intensive scientific programs by enhancing data ...
Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak process...
Cache memories were incorporated in microprocessors in the early times and represent the most common...
Applications often under-utilize cache space and there are no software locality optimization techniq...
Modern cache designs exploit spatial locality by fetching large blocks of data called cache lines on...
Since the introduction of cache memories in computer architecture, techniques to improve the data lo...
Over the past decades, core speeds have been improving at a much higher rate than memory bandwidth. ...
Data locality is central to modern computer designs. The widening gap between processor speed and me...
In the past decade, processor speed has become signicantly faster than memory speed. Small, fast cac...
This paper proposes an optimization by an alternative approach to memory mapping. Caches with low se...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
© 1994 ACM. In the past decade, processor speed has become significantly faster than memory speed. S...
The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programm...
Abstract—Exploiting locality of reference is key to realizing high levels of performance on modern p...
There is an ever widening performance gap between processors and main memory, a gap bridged by small...
The potential for improving the performance of data-intensive scientific programs by enhancing data ...
Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak process...
Cache memories were incorporated in microprocessors in the early times and represent the most common...
Applications often under-utilize cache space and there are no software locality optimization techniq...
Modern cache designs exploit spatial locality by fetching large blocks of data called cache lines on...
Since the introduction of cache memories in computer architecture, techniques to improve the data lo...
Over the past decades, core speeds have been improving at a much higher rate than memory bandwidth. ...
Data locality is central to modern computer designs. The widening gap between processor speed and me...
In the past decade, processor speed has become signicantly faster than memory speed. Small, fast cac...
This paper proposes an optimization by an alternative approach to memory mapping. Caches with low se...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
© 1994 ACM. In the past decade, processor speed has become significantly faster than memory speed. S...
The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programm...
Abstract—Exploiting locality of reference is key to realizing high levels of performance on modern p...
There is an ever widening performance gap between processors and main memory, a gap bridged by small...
The potential for improving the performance of data-intensive scientific programs by enhancing data ...
Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak process...
Cache memories were incorporated in microprocessors in the early times and represent the most common...