Distinguishing transient blocks from frequently used blocks enables servicing references to transient blocks from a small fully-associative auxiliary cache structure. By inserting only frequently used blocks into the main cache structure, we can reduce the number of conflict misses, thus achieving higher performance and allowing the use of direct mapped caches which offer lower power consumption and lower access latencies. In this paper we use a simple probabilistic filtering mechanism that uses random sampling to identify and select the frequently used blocks. Furthermore, by using a small direct-mapped lookup table to cache the most recently accessed blocks in the auxiliary cache, we eliminate the vast majority of the costly fullyassociat...
Skewed-associative caches have been shown to statisticaly exhibit lower miss ratios than set-associa...
Due to performance reasons, all ways in set-associative level-one (L1) data caches are accessed in p...
Abstract—In modern processor systems, on-chip Last Level Caches (LLCs) are used to bridge the speed ...
Memory latency has become an important performance bottleneck in current microprocessors. This probl...
L1 data caches in high-performance processors continue to grow in set associativity. Higher associat...
L1 data caches in high-performance processors continue to grow in set associativity. Higher associat...
This work addresses the problem of the increasing performance disparity between the microprocessor a...
Energy is an increasingly important consideration in memory system design. Although caches can save ...
Modern processors contain store-buffers to allow stores to retire under a miss, thus hiding store-mi...
The design trend of caches in modern processors continues to in-crease their capacity with higher as...
The most important processor performance bottleneck is the ever-increasing gap between the memory an...
Caches mitigate the long memory latency that limits the performance of modern processors. However, c...
Recent studies have shown that in highly associative caches, the perfor-mance gap between the Least ...
Low-latency data access is essential for performance. To achieve this, processors use fast first-lev...
Modern microprocessors tend to use on-chip caches that are much smaller than the working set size of...
Skewed-associative caches have been shown to statisticaly exhibit lower miss ratios than set-associa...
Due to performance reasons, all ways in set-associative level-one (L1) data caches are accessed in p...
Abstract—In modern processor systems, on-chip Last Level Caches (LLCs) are used to bridge the speed ...
Memory latency has become an important performance bottleneck in current microprocessors. This probl...
L1 data caches in high-performance processors continue to grow in set associativity. Higher associat...
L1 data caches in high-performance processors continue to grow in set associativity. Higher associat...
This work addresses the problem of the increasing performance disparity between the microprocessor a...
Energy is an increasingly important consideration in memory system design. Although caches can save ...
Modern processors contain store-buffers to allow stores to retire under a miss, thus hiding store-mi...
The design trend of caches in modern processors continues to in-crease their capacity with higher as...
The most important processor performance bottleneck is the ever-increasing gap between the memory an...
Caches mitigate the long memory latency that limits the performance of modern processors. However, c...
Recent studies have shown that in highly associative caches, the perfor-mance gap between the Least ...
Low-latency data access is essential for performance. To achieve this, processors use fast first-lev...
Modern microprocessors tend to use on-chip caches that are much smaller than the working set size of...
Skewed-associative caches have been shown to statisticaly exhibit lower miss ratios than set-associa...
Due to performance reasons, all ways in set-associative level-one (L1) data caches are accessed in p...
Abstract—In modern processor systems, on-chip Last Level Caches (LLCs) are used to bridge the speed ...