Supercomputer architectures are not as fast as logic technology allows because memories are slow than the CPU, conditional jumps limit the usefulness of pipelining and prefetching mechanisms, and functional-unit parallelism is limited by the speed of hardware scheduling. We propose a supercomputer architecture called Ring Of Prefetch Elements (ROPE) that attempts to solve the problems of memory latency and conditional jumps without hardware scheduling. ROPE consists of a pipelined CPU or very-large-instruction-word data path with a new instruction prefetching mechanism that supports general multi-way conditional jumps. To get high-performance without scheduling hardware, ROPE relies on an optimizing compiler based on a global code ...
Scaling the performance of applications with little thread-level parallelism is one of the most seri...
The memory wall places a significant limit on performance for many modern workloads. These applicati...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
This paper presents a novel pointer prefetching technique, called multi-chain prefetching. Multi-cha...
We present a new hardware-based data prefetching mechanism for enhancing instruction level paralleli...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
The capability of the Random Access Machine (RAM) to execute any instruction in constant time is not...
The large latency of memory accesses in modern computer systems is a key obstacle to achieving high ...
This paper describes future execution (FE), a simple hardware-only technique to accelerate indi-vidu...
Pointer-chasing applications tend to traverse composed data structures consisting of multiple indepe...
This paper describes a new hardware approach to data and instruction prefetching for superscalar pr...
In the last century great progress was achieved in developing processors with extremely high computa...
Ever increasing memory latencies and deeper pipelines push memory farther from the processor. Prefet...
In computer systems, latency tolerance is the use of concurrency to achieve high performance in spit...
Scaling the performance of applications with little thread-level parallelism is one of the most seri...
The memory wall places a significant limit on performance for many modern workloads. These applicati...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
This paper presents a novel pointer prefetching technique, called multi-chain prefetching. Multi-cha...
We present a new hardware-based data prefetching mechanism for enhancing instruction level paralleli...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
The capability of the Random Access Machine (RAM) to execute any instruction in constant time is not...
The large latency of memory accesses in modern computer systems is a key obstacle to achieving high ...
This paper describes future execution (FE), a simple hardware-only technique to accelerate indi-vidu...
Pointer-chasing applications tend to traverse composed data structures consisting of multiple indepe...
This paper describes a new hardware approach to data and instruction prefetching for superscalar pr...
In the last century great progress was achieved in developing processors with extremely high computa...
Ever increasing memory latencies and deeper pipelines push memory farther from the processor. Prefet...
In computer systems, latency tolerance is the use of concurrency to achieve high performance in spit...
Scaling the performance of applications with little thread-level parallelism is one of the most seri...
The memory wall places a significant limit on performance for many modern workloads. These applicati...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...