Decoupled computer architectures partition the memory access and execute functions in a computer program and achieve high performance by exploiting the fine-grain parallelism between the two. These ar-chitectures make use of an access processor to perform the data fetch ahead of demand by the execute process and hence are often less sensitive to memory access de-lays than conventional architectures, Past performance studies of decoupled computers used memory systems that are interleaved or pipelined. We undertake a sim-ulation study of the latency effects in decoupled com-puters when connected to a single, conventional, non-interleaved data memory module so that the effect of decoupling is isolated from the improvement caused by interleavin...
In this paper we investigate the behavior of data prefetching on an access decoupled machine and a s...
It has become a truism that the gap between processor speed and memory access latency is continuing ...
A variety of computer systems from HPC to mobile systems are power limited and performance sensitive...
Decoupling is an architectural organization that may tolerate long memory latencies by executing mem...
Several studies have demonstrated that out-of-order execution processors may not be the most adequat...
Several studies have demonstrated that out-of-order execution processors may not be the most adequat...
This paper discusses an approach to reducing memory latency in future systems. It focuses on systems...
Decoupled computer architectures provide high scalar performance by exploiting the fine--grained par...
An architecture for high-performance scalar computation is proposed and discussed. The main feature ...
It is my great pleasure to serve as guest editor for this special issue of TCCA Newsletter, which is...
The purpose of this paper is to show that using decoupling techniques in a vector processor, the per...
This dissertation presents a novel decoupled latency tolerance technique for 1000-core data parallel...
This work presents and evaluates a novel processor microarchitecture which combines two paradigms: a...
This is a presentation of initial ideas on techniques that can be used in order to achieve a predict...
The increasing hardware complexity of dynamically-scheduled superscalar processors may compromise th...
In this paper we investigate the behavior of data prefetching on an access decoupled machine and a s...
It has become a truism that the gap between processor speed and memory access latency is continuing ...
A variety of computer systems from HPC to mobile systems are power limited and performance sensitive...
Decoupling is an architectural organization that may tolerate long memory latencies by executing mem...
Several studies have demonstrated that out-of-order execution processors may not be the most adequat...
Several studies have demonstrated that out-of-order execution processors may not be the most adequat...
This paper discusses an approach to reducing memory latency in future systems. It focuses on systems...
Decoupled computer architectures provide high scalar performance by exploiting the fine--grained par...
An architecture for high-performance scalar computation is proposed and discussed. The main feature ...
It is my great pleasure to serve as guest editor for this special issue of TCCA Newsletter, which is...
The purpose of this paper is to show that using decoupling techniques in a vector processor, the per...
This dissertation presents a novel decoupled latency tolerance technique for 1000-core data parallel...
This work presents and evaluates a novel processor microarchitecture which combines two paradigms: a...
This is a presentation of initial ideas on techniques that can be used in order to achieve a predict...
The increasing hardware complexity of dynamically-scheduled superscalar processors may compromise th...
In this paper we investigate the behavior of data prefetching on an access decoupled machine and a s...
It has become a truism that the gap between processor speed and memory access latency is continuing ...
A variety of computer systems from HPC to mobile systems are power limited and performance sensitive...