Modern superscalar processors often suffer long stalls due to load misses in on-chip L2 caches. To address this problem, we propose hiding L2 misses with Checkpoint-Assisted VAlue prediction (CAVA). On an L2 cache miss, a predicted value is returned to the processor. When the missing load finally reaches the head of the ROB, the processor checkpoints its state, retires the load, and speculatively continues executing using the predicted value. When the value in memory arrives at the L2 cache, it is compared to the predicted value. If the prediction was correct, speculation has succeeded and execution continues; otherwise, execution is rolled back and restarted from the checkpoint. CAVA uses fast checkpointing, speculative buffering, and a mo...
Low-latency data access is essential for performance. To achieve this, processors use fast first-lev...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
As the performance gap between the processor cores and the memory subsystem increases, designers are...
Modern superscalar processors often suffer long stalls due to load misses in on-chip L2 caches. To a...
The ever-increasing computational power of contemporary microprocessors reduces the execution time s...
The ever-increasing computational power of contemporary microprocessors reduces the execution time s...
Recent architectural approaches that address speculative side-channel attacks aim to prevent softwar...
Modern processors rely heavily on speculation to provide performance. Techniques such as branch pred...
Speculative execution, the base on which modern high-performance general-purpose CPUs are built on, ...
L1 instruction-cache misses pose a critical performance bottleneck in commercial server workloads. C...
Cache performance analysis is becoming increasingly important in microprocessor design. This work ex...
This paper aims to tackle two fundamental memory bottle-necks: limited off-chip bandwidth (bandwidth...
L2 misses are one of the main causes for stalling the activity in current and future microprocessors...
Trace caches are used to help dynamic branch prediction make multiple predictions in a cycle by embe...
This paper demonstrates how to utilize the inherent error resilience of a wide range of applications...
Low-latency data access is essential for performance. To achieve this, processors use fast first-lev...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
As the performance gap between the processor cores and the memory subsystem increases, designers are...
Modern superscalar processors often suffer long stalls due to load misses in on-chip L2 caches. To a...
The ever-increasing computational power of contemporary microprocessors reduces the execution time s...
The ever-increasing computational power of contemporary microprocessors reduces the execution time s...
Recent architectural approaches that address speculative side-channel attacks aim to prevent softwar...
Modern processors rely heavily on speculation to provide performance. Techniques such as branch pred...
Speculative execution, the base on which modern high-performance general-purpose CPUs are built on, ...
L1 instruction-cache misses pose a critical performance bottleneck in commercial server workloads. C...
Cache performance analysis is becoming increasingly important in microprocessor design. This work ex...
This paper aims to tackle two fundamental memory bottle-necks: limited off-chip bandwidth (bandwidth...
L2 misses are one of the main causes for stalling the activity in current and future microprocessors...
Trace caches are used to help dynamic branch prediction make multiple predictions in a cycle by embe...
This paper demonstrates how to utilize the inherent error resilience of a wide range of applications...
Low-latency data access is essential for performance. To achieve this, processors use fast first-lev...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
As the performance gap between the processor cores and the memory subsystem increases, designers are...