Modern superscalar processors often suffer long stalls due to load misses in on-chip L2 caches. To address this problem, we propose hiding L2 misses with Checkpoint-Assisted VAlue prediction (CAVA). On an L2 cache miss, a predicted value is returned to the processor. When the missing load finally reaches the head of the ROB, the processor checkpoints its state, retires the load, and speculatively continues executing using the predicted value. When the value in memory arrives at the L2 cache, it is compared to the predicted value. If the prediction was correct, speculation has succeeded and execution continues; otherwise, execution is rolled back and restarted from the checkpoint. CAVA uses fast checkpointing, speculative buffering, and a mo...
Energy is an increasingly important consideration in memory system design. Although caches can save ...
Low-latency data access is essential for performance. To achieve this, processors use fast first-lev...
Recent studies have shown that in highly associative caches, the perfor-mance gap between the Least ...
Modern superscalar processors often suffer long stalls due to load misses in on-chip L2 caches. To a...
The ever-increasing computational power of contemporary microprocessors reduces the execution time s...
The ever-increasing computational power of contemporary microprocessors reduces the execution time s...
Recent architectural approaches that address speculative side-channel attacks aim to prevent softwar...
This paper aims to tackle two fundamental memory bottle-necks: limited off-chip bandwidth (bandwidth...
Speculative execution, the base on which modern high-performance general-purpose CPUs are built on, ...
This paper demonstrates how to utilize the inherent error resilience of a wide range of applications...
As the performance gap between the processor cores and the memory subsystem increases, designers are...
Modern processors rely heavily on speculation to provide performance. Techniques such as branch pred...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
Trace caches are used to help dynamic branch prediction make multiple predictions in a cycle by embe...
L2 misses are one of the main causes for stalling the activity in current and future microprocessors...
Energy is an increasingly important consideration in memory system design. Although caches can save ...
Low-latency data access is essential for performance. To achieve this, processors use fast first-lev...
Recent studies have shown that in highly associative caches, the perfor-mance gap between the Least ...
Modern superscalar processors often suffer long stalls due to load misses in on-chip L2 caches. To a...
The ever-increasing computational power of contemporary microprocessors reduces the execution time s...
The ever-increasing computational power of contemporary microprocessors reduces the execution time s...
Recent architectural approaches that address speculative side-channel attacks aim to prevent softwar...
This paper aims to tackle two fundamental memory bottle-necks: limited off-chip bandwidth (bandwidth...
Speculative execution, the base on which modern high-performance general-purpose CPUs are built on, ...
This paper demonstrates how to utilize the inherent error resilience of a wide range of applications...
As the performance gap between the processor cores and the memory subsystem increases, designers are...
Modern processors rely heavily on speculation to provide performance. Techniques such as branch pred...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
Trace caches are used to help dynamic branch prediction make multiple predictions in a cycle by embe...
L2 misses are one of the main causes for stalling the activity in current and future microprocessors...
Energy is an increasingly important consideration in memory system design. Although caches can save ...
Low-latency data access is essential for performance. To achieve this, processors use fast first-lev...
Recent studies have shown that in highly associative caches, the perfor-mance gap between the Least ...