As technological process shrinks and clock rate increases, instruction caches can no longer be accessed in one cycle. Alternatives are implementing smaller caches (with higher miss rate) or large caches with a pipelined access (with higher branch misprediction penalty). In both cases, the performance obtained is far from the obtained by an ideal large cache with one-cycle access. In this paper we present cache line guided prestaging (CLGP), a novel mechanism that overcomes the limitations of current instruction cache implementations. CLGP employs prefetching to charge future cache lines into a set of fast prestage buffers. These buffers are managed efficiently by the CLGP algorithm, trying to fetch from them as much as possible. Therefore, ...
With off-chip memory access taking 100's of processor cycles, getting data to the processor in a tim...
As the degree of instruction-level parallelism in superscalar architectures increases, the gap betwe...
A common mechanism to perform hardware-based prefetching for regular accesses to arrays and chained...
As technological process shrinks and clock rate increases, instruction caches can no longer be acces...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
It is well known that memory latency is a major deterrent to achieving the maximum possible performa...
The full text of this article is not available on SOAR. WSU users can access the article via IEEE Xp...
L1 instruction-cache misses pose a critical performance bottleneck in commercial server workloads. C...
This paper proposes a method of buffering instructions by software-based prefetching. The method all...
The exponentially increasing gap between processors and off-chip memory, as measured in processor cy...
AbstractMemory access latency is a main bottleneck limiting further improvement of multi-core proces...
Cache performance analysis is becoming increasingly important in microprocessor design. This work ex...
Journal ArticleThe speed gap between processors and memory system is becoming the performance bottle...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
Instruction cache miss latency is becoming an increasingly important performance bottleneck, especia...
With off-chip memory access taking 100's of processor cycles, getting data to the processor in a tim...
As the degree of instruction-level parallelism in superscalar architectures increases, the gap betwe...
A common mechanism to perform hardware-based prefetching for regular accesses to arrays and chained...
As technological process shrinks and clock rate increases, instruction caches can no longer be acces...
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarch...
It is well known that memory latency is a major deterrent to achieving the maximum possible performa...
The full text of this article is not available on SOAR. WSU users can access the article via IEEE Xp...
L1 instruction-cache misses pose a critical performance bottleneck in commercial server workloads. C...
This paper proposes a method of buffering instructions by software-based prefetching. The method all...
The exponentially increasing gap between processors and off-chip memory, as measured in processor cy...
AbstractMemory access latency is a main bottleneck limiting further improvement of multi-core proces...
Cache performance analysis is becoming increasingly important in microprocessor design. This work ex...
Journal ArticleThe speed gap between processors and memory system is becoming the performance bottle...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
Instruction cache miss latency is becoming an increasingly important performance bottleneck, especia...
With off-chip memory access taking 100's of processor cycles, getting data to the processor in a tim...
As the degree of instruction-level parallelism in superscalar architectures increases, the gap betwe...
A common mechanism to perform hardware-based prefetching for regular accesses to arrays and chained...