We propose in this paper a new approach to study the temporal and spatial locality of codes using a plot of cache miss bandwidth as a function of cache size and line size for a fully associative LRU cache. We apply this new approach to the study of locality for several High-Performance benchmarks. We show that this plot capture fine behavior of these benchmarks and explain some of the difficulties that recent attempts to characterize locality using a few parameters are facing: Codes can exhibit different levels of temporal or spatial locality for different cache sizes; averaging these different behavior requires to weight properly the cost of misses at different levels of the memory hierarchy. We propose such a scheme, for an average measur...
The growing gap between processor clock speed and DRAM access time puts new demands on software and ...
Modern cache designs exploit spatial locality by fetching large blocks of data called cache lines on...
The widening memory gap reduces performance of applications with poor data locality. Therefore, ther...
We propose in this paper a new approach to study the temporal and spatial locality of codes using a ...
Several benchmarks for measuring memory performance of HPC systems along dimensions of spatial and t...
Several benchmarks for measuring memory performance of HPC systems along dimensions of spatial and t...
A scalar metric for temporal locality is proposed. The metric is based on LRU stack distance. This p...
This paper studies the theory of caching and temporal and spatial locality. We show the following re...
Locality, characterized by data reuses, determines caching performance. Reuse distance (i.e. LRU st...
Data locality is central to modern computer designs. The widening gap between processor speed and me...
Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak process...
The locality of a program may be quantified by the data footprint over a time period or by the miss ...
Locality is an essential concept of caching, so a well-defined, mathematical model of locality and i...
Over the past decades, core speeds have been improving at a much higher rate than memory bandwidth. ...
The data layout of a program is critical to performance because it determines the spatial localit...
The growing gap between processor clock speed and DRAM access time puts new demands on software and ...
Modern cache designs exploit spatial locality by fetching large blocks of data called cache lines on...
The widening memory gap reduces performance of applications with poor data locality. Therefore, ther...
We propose in this paper a new approach to study the temporal and spatial locality of codes using a ...
Several benchmarks for measuring memory performance of HPC systems along dimensions of spatial and t...
Several benchmarks for measuring memory performance of HPC systems along dimensions of spatial and t...
A scalar metric for temporal locality is proposed. The metric is based on LRU stack distance. This p...
This paper studies the theory of caching and temporal and spatial locality. We show the following re...
Locality, characterized by data reuses, determines caching performance. Reuse distance (i.e. LRU st...
Data locality is central to modern computer designs. The widening gap between processor speed and me...
Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak process...
The locality of a program may be quantified by the data footprint over a time period or by the miss ...
Locality is an essential concept of caching, so a well-defined, mathematical model of locality and i...
Over the past decades, core speeds have been improving at a much higher rate than memory bandwidth. ...
The data layout of a program is critical to performance because it determines the spatial localit...
The growing gap between processor clock speed and DRAM access time puts new demands on software and ...
Modern cache designs exploit spatial locality by fetching large blocks of data called cache lines on...
The widening memory gap reduces performance of applications with poor data locality. Therefore, ther...