Highly aggressive multi-issue processor designs of the past few years and projections for the next decade require that we redesign the operation of the cache memory system. The number of instructions that must be processed (including incorrectly predicted ones) will approach 16 or more per cycle. Since memory operations account for about a third of all instructions executed, these systems will have to support multiple data references per cycle. In this paper, we explore reference stream characteristics to determine how best to meet the need for ever increasing access rates. We identify limitations of existing multiported cache designs and propose a new structure, the Locality-Based Interleaved Cache (LBIC), to exploit the characteristics of...
The cache interference is found to play a critical role in optimizing cache allocation among concurr...
To meet the growing computation-intensive applications and the needs of low-power, high-performance ...
During the last two decades, the performance of CPU has been developed much faster than that of memo...
Highly aggressive multi-issue processor designs of the past few years and projections for the next d...
As the issue widths of processors continue to increase, efficient data supply will become ever more ...
On-chip L2 cache architectures, well established in high-performance parallel computing systems, are...
The design of the memory hierarchy in a multi-core architecture is a critical component since it mus...
This paper explores an important behavior of memory access instructions, called access region locali...
With off-chip memory access taking 100's of processor cycles, getting data to the processor in a tim...
In the world of complex SoCs for consumer applica-tions, multiprocessor architectures usually deploy...
The issue of the power wall has had a drastic impact on many aspects of system design. Even though f...
The gap between CPU and main memory speeds has long been a performance bottleneck. As we move toward...
The most important processor performance bottleneck is the ever-increasing gap between the memory an...
In an effort to push the envelope of system performance, mi-croprocessor designs are continually exp...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
The cache interference is found to play a critical role in optimizing cache allocation among concurr...
To meet the growing computation-intensive applications and the needs of low-power, high-performance ...
During the last two decades, the performance of CPU has been developed much faster than that of memo...
Highly aggressive multi-issue processor designs of the past few years and projections for the next d...
As the issue widths of processors continue to increase, efficient data supply will become ever more ...
On-chip L2 cache architectures, well established in high-performance parallel computing systems, are...
The design of the memory hierarchy in a multi-core architecture is a critical component since it mus...
This paper explores an important behavior of memory access instructions, called access region locali...
With off-chip memory access taking 100's of processor cycles, getting data to the processor in a tim...
In the world of complex SoCs for consumer applica-tions, multiprocessor architectures usually deploy...
The issue of the power wall has had a drastic impact on many aspects of system design. Even though f...
The gap between CPU and main memory speeds has long been a performance bottleneck. As we move toward...
The most important processor performance bottleneck is the ever-increasing gap between the memory an...
In an effort to push the envelope of system performance, mi-croprocessor designs are continually exp...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
The cache interference is found to play a critical role in optimizing cache allocation among concurr...
To meet the growing computation-intensive applications and the needs of low-power, high-performance ...
During the last two decades, the performance of CPU has been developed much faster than that of memo...