In the last-level cache, large amounts of blocks have reuse distances greater than the available cache capacity. Cache performance and efficiency can be improved if some subset of these distant reuse blocks can reside in the cache longer. The bypass technique is an effective and attractive solution that prevents the insertion of harmful blocks. Our analysis shows that bypass can contribute significan-t performance improvement, and the optimal bypass can achieve similar performance compared to OPT+B, which is the theoretical optimal replacement policy. Thus, we propose a bypass technique called Optimal Bypass Moni-tor (OBM), which makes bypass decisions by learning and predicting the behavior of the optimal bypass. OBM keeps a short global t...
This paper proposes a novel methodology for cache replacement policy based on techniques of genetic ...
Caches mitigate the long memory latency that limits the performance of modern processors. However, c...
Performance loss due to long-latency memory accesses can be reduced by servicing multiple memory acc...
The last level cache (LLC) is critical for mobile computer systems in terms of both energy consumpti...
Modern processors use high-performance cache replacement policies that outperform traditional altern...
This thesis describes a model used to analyze the replacement decisions made by LRU and OPT (Least-R...
Modern processors use high-performance cache replacement policies that outperform traditional altern...
The inherent temporal locality in memory accesses is filtered out by the L1 cache. As a consequence,...
In this paper, we propose a new block selection policy for Last-Level Caches (LLCs) that decides, ba...
Recent studies have shown that in highly associative caches, the perfor-mance gap between the Least ...
Abstract—In modern processor systems, on-chip Last Level Caches (LLCs) are used to bridge the speed ...
We introduce a novel approach to predict whether a block should be allocated in the cache or not upo...
Memory latency has become an important performance bottleneck in current microprocessors. This probl...
Classic cache replacement policies assume that miss costs are uniform. However, the correlation betw...
The growing performance gap caused by high processor clock rates and slow DRAM accesses makes cache ...
This paper proposes a novel methodology for cache replacement policy based on techniques of genetic ...
Caches mitigate the long memory latency that limits the performance of modern processors. However, c...
Performance loss due to long-latency memory accesses can be reduced by servicing multiple memory acc...
The last level cache (LLC) is critical for mobile computer systems in terms of both energy consumpti...
Modern processors use high-performance cache replacement policies that outperform traditional altern...
This thesis describes a model used to analyze the replacement decisions made by LRU and OPT (Least-R...
Modern processors use high-performance cache replacement policies that outperform traditional altern...
The inherent temporal locality in memory accesses is filtered out by the L1 cache. As a consequence,...
In this paper, we propose a new block selection policy for Last-Level Caches (LLCs) that decides, ba...
Recent studies have shown that in highly associative caches, the perfor-mance gap between the Least ...
Abstract—In modern processor systems, on-chip Last Level Caches (LLCs) are used to bridge the speed ...
We introduce a novel approach to predict whether a block should be allocated in the cache or not upo...
Memory latency has become an important performance bottleneck in current microprocessors. This probl...
Classic cache replacement policies assume that miss costs are uniform. However, the correlation betw...
The growing performance gap caused by high processor clock rates and slow DRAM accesses makes cache ...
This paper proposes a novel methodology for cache replacement policy based on techniques of genetic ...
Caches mitigate the long memory latency that limits the performance of modern processors. However, c...
Performance loss due to long-latency memory accesses can be reduced by servicing multiple memory acc...