The potential of high-performance systems, especially parallel machines, is generally limited by the bandwidth between processors and memory. To achieve the performance of which these machines should be capable, global memory access delays must be alleviated. One approach is to better utilize local storage. To this end, we previously proposed a new local storage facility, the priority data cache (PDC), that supports integrated hardware-software control. This paper focuses on developing a compile-time methodology for using the PDC. We detail the basic algorithm and propose enhancements based on initial performance results. Key words: data access optimization, data prioritization, compiler-directed cache management, memory hierarchies, data ...
High-performance scientific computing relies increasingly on high-level large-scale object-oriented ...
Memory (cache, DRAM, and disk) is in charge of providing data and instructions to a computer\u27s pr...
While CPU speed has been improved by a factor of 6400 over the past twenty years, memory bandwidth h...
The widening gap between processor and memory speeds renders data locality optimization a very impor...
An ideal high performance computer includes a fast processor and a multi-million byte memory of comp...
The performance of the memory hierarchy has become one of the most critical elements in the performa...
The performance of cache memories relies on the locality exhibited by programs. Traditionally this l...
Many applications are memory intensive and thus are bounded by memory latency and bandwidth. While i...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
Cache performance is critical in cache-based supercomputers, where the cache-miss/cache-hit memory r...
The cache coherence maintenance problem has been the major obstacle in using private cache memory to...
© 1994 ACM. In the past decade, processor speed has become significantly faster than memory speed. S...
Minimizing power, increasing performance, and delivering effective memory bandwidth are today's prim...
Reducing memory latency is critical to the performance of large-scale parallel systems. Due to the t...
High-performance scientific computing relies increasingly on high-level large-scale object-oriented ...
Memory (cache, DRAM, and disk) is in charge of providing data and instructions to a computer\u27s pr...
While CPU speed has been improved by a factor of 6400 over the past twenty years, memory bandwidth h...
The widening gap between processor and memory speeds renders data locality optimization a very impor...
An ideal high performance computer includes a fast processor and a multi-million byte memory of comp...
The performance of the memory hierarchy has become one of the most critical elements in the performa...
The performance of cache memories relies on the locality exhibited by programs. Traditionally this l...
Many applications are memory intensive and thus are bounded by memory latency and bandwidth. While i...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
Cache performance is critical in cache-based supercomputers, where the cache-miss/cache-hit memory r...
The cache coherence maintenance problem has been the major obstacle in using private cache memory to...
© 1994 ACM. In the past decade, processor speed has become significantly faster than memory speed. S...
Minimizing power, increasing performance, and delivering effective memory bandwidth are today's prim...
Reducing memory latency is critical to the performance of large-scale parallel systems. Due to the t...
High-performance scientific computing relies increasingly on high-level large-scale object-oriented ...
Memory (cache, DRAM, and disk) is in charge of providing data and instructions to a computer\u27s pr...
While CPU speed has been improved by a factor of 6400 over the past twenty years, memory bandwidth h...