Providing adequate data bandwidth is extremely important for a wide-issue superscalar processor to achieve its full performance potential. Adding a large number of ports to a data cache, however, becomes increasingly inefficient and can add to the hardware complexity significantly. This paper takes an alternative or complementary approach for providing more data bandwidth, called the datadecoupled architecture. The approach, with support from the compiler and hardware, partitions the memory stream into multiple independent streams early in the processor pipeline, and feeds each stream to a separate memory access queue and cache. Under this model, the paper studies the potential of decoupling memory accesses to program's local variables...
The potential of high-performance systems, especially parallel machines, is generally limited by the...
DS is a new microarchitecture that combines decoupled (DAE) and superscalar techniques to exploit in...
In this paper we investigate the behavior of data prefetching on an access decoupled machine and a s...
This paper explores an important behavior of memory access instructions, called access region locali...
Modern superscalar processors use advanced features like dynamic scheduling and speculative executio...
The performance of superscalar processors is more sensitive to the memory system delay than their si...
Highly aggressive multi-issue processor designs of the past few years and projections for the next d...
This work explores the tradeoffs of the memory system of a new massively parallel multiprocessor in ...
An architecture for high-performance scalar computation is proposed and discussed. The main feature ...
Decoupled computer architectures provide high scalar performance by exploiting the fine--grained par...
It is my great pleasure to serve as guest editor for this special issue of TCCA Newsletter, which is...
Processor design techniques, such as pipelining, superscalar, and VLIW, have dramatically decreased ...
Memory system efficiency is crucial for any processor to achieve high performance, especially in the...
The goal of cache design is to exploit data localities; however, the means to this end vary widely a...
Recent work on compilation for DSP-processors deals with optimizing access to local variables of fun...
The potential of high-performance systems, especially parallel machines, is generally limited by the...
DS is a new microarchitecture that combines decoupled (DAE) and superscalar techniques to exploit in...
In this paper we investigate the behavior of data prefetching on an access decoupled machine and a s...
This paper explores an important behavior of memory access instructions, called access region locali...
Modern superscalar processors use advanced features like dynamic scheduling and speculative executio...
The performance of superscalar processors is more sensitive to the memory system delay than their si...
Highly aggressive multi-issue processor designs of the past few years and projections for the next d...
This work explores the tradeoffs of the memory system of a new massively parallel multiprocessor in ...
An architecture for high-performance scalar computation is proposed and discussed. The main feature ...
Decoupled computer architectures provide high scalar performance by exploiting the fine--grained par...
It is my great pleasure to serve as guest editor for this special issue of TCCA Newsletter, which is...
Processor design techniques, such as pipelining, superscalar, and VLIW, have dramatically decreased ...
Memory system efficiency is crucial for any processor to achieve high performance, especially in the...
The goal of cache design is to exploit data localities; however, the means to this end vary widely a...
Recent work on compilation for DSP-processors deals with optimizing access to local variables of fun...
The potential of high-performance systems, especially parallel machines, is generally limited by the...
DS is a new microarchitecture that combines decoupled (DAE) and superscalar techniques to exploit in...
In this paper we investigate the behavior of data prefetching on an access decoupled machine and a s...