The core of current-generation high-performance multiprocessor systems is out-of-order execution processors with aggressive branch prediction. Despite their relatively high branch prediction accuracy, these processors still execute many memory instructions down mispredicted paths. Previous work that focused on uniprocessors showed that these wrong-path (WP) memory references may pollute the caches and increase the amount of cache and memory traffic. On the positive side, however, they may prefetch data into the caches for memory references on the correct-path. While computer architects have thoroughly studied the impact of WP effects in uniprocessor systems, there is no comparable work for multiprocessor systems. In this paper, we explore t...
To reduce overhead of cache coherence enforcement in shared-bus multiprocessors, we propose a selfin...
[[abstract]]A cache coherence protocol for a multiprocessor system. Each processor in the system has...
[[abstract]]An optimization scheme for a directory-based cache coherence protocol for multistage int...
The core of current-generation high-performance multiprocessor systems is out-of-order execution pro...
High-performance multiprocessor systems built around out-of-order processors with aggressive branch ...
Cache performance analysis is becoming increasingly important in microprocessor design. This work ex...
As the degree of instruction-level parallelism in superscalar architectures increases, the gap betwe...
The speculated execution of threads in a multithreaded architecture plus the branch prediction used ...
Thesis (Ph. D.)--University of Washington, 1987Shared-memory multiprocessors offer increased computa...
Shared-memory multiprocessors built from commodity microprocessors are being increasingly used to pr...
Directory-based cache coherence protocol is accepted as the common technique in large scale shared m...
Cache coherence protocols play an important role in the performance of distributed and centralized s...
200 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1993.The use of a private cache in...
An important architectural design decision affecting the performance of coherent caches in shared-me...
Parallel applications exhibit a wide variety of memory reference patterns. Designing a memory archit...
To reduce overhead of cache coherence enforcement in shared-bus multiprocessors, we propose a selfin...
[[abstract]]A cache coherence protocol for a multiprocessor system. Each processor in the system has...
[[abstract]]An optimization scheme for a directory-based cache coherence protocol for multistage int...
The core of current-generation high-performance multiprocessor systems is out-of-order execution pro...
High-performance multiprocessor systems built around out-of-order processors with aggressive branch ...
Cache performance analysis is becoming increasingly important in microprocessor design. This work ex...
As the degree of instruction-level parallelism in superscalar architectures increases, the gap betwe...
The speculated execution of threads in a multithreaded architecture plus the branch prediction used ...
Thesis (Ph. D.)--University of Washington, 1987Shared-memory multiprocessors offer increased computa...
Shared-memory multiprocessors built from commodity microprocessors are being increasingly used to pr...
Directory-based cache coherence protocol is accepted as the common technique in large scale shared m...
Cache coherence protocols play an important role in the performance of distributed and centralized s...
200 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1993.The use of a private cache in...
An important architectural design decision affecting the performance of coherent caches in shared-me...
Parallel applications exhibit a wide variety of memory reference patterns. Designing a memory archit...
To reduce overhead of cache coherence enforcement in shared-bus multiprocessors, we propose a selfin...
[[abstract]]A cache coherence protocol for a multiprocessor system. Each processor in the system has...
[[abstract]]An optimization scheme for a directory-based cache coherence protocol for multistage int...