Abstract.: The amount of information recorded in the prediction tables of the address predictors turns out to be comparable to current on-chip cache sizes. To reduce their area cost, we consider the spatial-locality property of memory references. We propose to split the addresses in two parts (high-order bits and low-order bits) and record them in different tables. This organization allows to record only once every unique high-order bits. We use it in a last-address predictor and our evaluations show that it produces significant area-cost reductions (28%-60%) without performance decreases.
For many programs, especially integer codes, untolerated load instruction latencies account for a si...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
The increasing capacity of NAND flash memory leads to large RAM footprint on address mapping in the ...
Hard-to-predict branches depending on long-latency cache-misses have been recognized as a major perf...
Modern cache designs exploit spatial locality by fetching large blocks of data called cache lines on...
The execution time of programs that have large working sets is substantially increased by the overhe...
Abstract — Recent works have proposed the use of prediction techniques to execute speculatively true...
Mitigating the effect of the large latency of load instructions is one of challenges of micro-proces...
. Data speculation refers to the execution of an instruction before some logically preceding instruc...
This paper presents a technique for minimizing chip-area cost of implementing an on-chip cache memor...
Processor architectures will increasingly rely on issuing multiple instructions to make full use of ...
Energy is an increasingly important consideration in memory system design. Although caches can save ...
This paper presents a technique for minimizing chip-area cost of implementing an on-chip cache memor...
Cache memories are commonly implemented through multiple memory banks to improve bandwidth and laten...
In this correspondence, we propose design techniques that may significantly simplify the cache acces...
For many programs, especially integer codes, untolerated load instruction latencies account for a si...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
The increasing capacity of NAND flash memory leads to large RAM footprint on address mapping in the ...
Hard-to-predict branches depending on long-latency cache-misses have been recognized as a major perf...
Modern cache designs exploit spatial locality by fetching large blocks of data called cache lines on...
The execution time of programs that have large working sets is substantially increased by the overhe...
Abstract — Recent works have proposed the use of prediction techniques to execute speculatively true...
Mitigating the effect of the large latency of load instructions is one of challenges of micro-proces...
. Data speculation refers to the execution of an instruction before some logically preceding instruc...
This paper presents a technique for minimizing chip-area cost of implementing an on-chip cache memor...
Processor architectures will increasingly rely on issuing multiple instructions to make full use of ...
Energy is an increasingly important consideration in memory system design. Although caches can save ...
This paper presents a technique for minimizing chip-area cost of implementing an on-chip cache memor...
Cache memories are commonly implemented through multiple memory banks to improve bandwidth and laten...
In this correspondence, we propose design techniques that may significantly simplify the cache acces...
For many programs, especially integer codes, untolerated load instruction latencies account for a si...
While runahead execution is effective at parallelizing independent long-latency cache misses, it is ...
The increasing capacity of NAND flash memory leads to large RAM footprint on address mapping in the ...