High-performance multiprocessor systems built around out-of-order processors with aggressive branch predictors execute many memory references that turn out to be on a mispredicted branch path. Previous work that focused on uniprocessors showed that these wrong-path memory references may pollute the caches by bringing in data that are not needed on the correct execution path and by evicting useful data or instructions. Additionally, they may also increase the amount of cache and memory traffic. On the positive side, however, they may have a prefetching effect for memory references on the correct path. While computer architects have thoroughly studied the impact of wrong-path effects in uniprocessor systems, there is no previous work on its e...
Parallel applications exhibit a wide variety of memory reference patterns. Designing a memory archit...
Abstract As the difference in speed between processor and memory system continues to increase, it is...
The abstraction of a cache is useful to hide the vast difference in speed of computer processors and...
High-performance multiprocessor systems built around out-of-order processors with aggressive branch ...
The core of current-generation high-performance multiprocessor systems is out-of-order execution pro...
Cache performance analysis is becoming increasingly important in microprocessor design. This work ex...
As the degree of instruction-level parallelism in superscalar architectures increases, the gap betwe...
The speculated execution of threads in a multithreaded architecture plus the branch prediction used ...
Instruction cache misses can severely limit the performance of both superscalar processors and high ...
Modern processors attempt to overcome increasing memory latencies by anticipating future references ...
. To reduce the overhead of cache coherence enforcement in shared-bus multiprocessors, we propose a ...
Compiler-directed cache prefetching has the poten-tial to hide much of the high memory latency seen ...
Shared-memory multiprocessors built from commodity microprocessors are being increasingly used to pr...
[[abstract]]An optimization scheme for a directory-based cache coherence protocol for multistage int...
Thesis (Ph. D.)--University of Washington, 1987Shared-memory multiprocessors offer increased computa...
Parallel applications exhibit a wide variety of memory reference patterns. Designing a memory archit...
Abstract As the difference in speed between processor and memory system continues to increase, it is...
The abstraction of a cache is useful to hide the vast difference in speed of computer processors and...
High-performance multiprocessor systems built around out-of-order processors with aggressive branch ...
The core of current-generation high-performance multiprocessor systems is out-of-order execution pro...
Cache performance analysis is becoming increasingly important in microprocessor design. This work ex...
As the degree of instruction-level parallelism in superscalar architectures increases, the gap betwe...
The speculated execution of threads in a multithreaded architecture plus the branch prediction used ...
Instruction cache misses can severely limit the performance of both superscalar processors and high ...
Modern processors attempt to overcome increasing memory latencies by anticipating future references ...
. To reduce the overhead of cache coherence enforcement in shared-bus multiprocessors, we propose a ...
Compiler-directed cache prefetching has the poten-tial to hide much of the high memory latency seen ...
Shared-memory multiprocessors built from commodity microprocessors are being increasingly used to pr...
[[abstract]]An optimization scheme for a directory-based cache coherence protocol for multistage int...
Thesis (Ph. D.)--University of Washington, 1987Shared-memory multiprocessors offer increased computa...
Parallel applications exhibit a wide variety of memory reference patterns. Designing a memory archit...
Abstract As the difference in speed between processor and memory system continues to increase, it is...
The abstraction of a cache is useful to hide the vast difference in speed of computer processors and...