To maintain coherence in conventional shared-memory multiprocessor systems, processors first check other processors’ caches before obtaining data from memory. This coherence checking adds latency to memory requests and leads to large amounts of interconnect traffic in broadcastbased systems. Our results for a set of commercial, scientific and multiprogrammed workloads show that on average 67 % (and up to 94%) of broadcasts are unnecessary. Coarse-Grain Coherence Tracking is a new technique that supplements a conventional coherence mechanism and optimizes the performance of coherence enforcement. The Coarse-Grain Coherence mechanism monitors the coherence status of large regions of memory, and uses that information to avoid unnecessary broad...
Across a broad range of applications, multicore technol-ogy is the most important factor that drives...
The quest to improve performance forces designers to explore finer-grained multiprocessor machines. ...
In a shared-memory multiprocessor with private caches, cached copies of a data item must be kept con...
To maintain coherence in conventional shared-memory multiprocessor systems, processors first check o...
Coarse-grain coherence tracking is a new technique that extends a conventional coherence mechanism a...
Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system ...
We argue that OS-provided data coherence on non-cache-coherent NUMA multiprocessors (machines with a...
Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache c...
Previous work in scalable hardware distributed shared memory (DSM) multiprocessors has established t...
Previous work in scalable hardware distributed shared memory (DSM) multiprocessors has established t...
The prevailing use of multicores in Embedded Critical Systems (ECS) is multi-application workloads i...
Caches have the potential to provide multiprocessors with an automatic mechanism for reducing both n...
textThis dissertation explores techniques for reducing the costs of inter-processor communication i...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 1993. Simultaneously published ...
In large scale machines, thousands of processor cycles, in other words, missed opportunities to issu...
Across a broad range of applications, multicore technol-ogy is the most important factor that drives...
The quest to improve performance forces designers to explore finer-grained multiprocessor machines. ...
In a shared-memory multiprocessor with private caches, cached copies of a data item must be kept con...
To maintain coherence in conventional shared-memory multiprocessor systems, processors first check o...
Coarse-grain coherence tracking is a new technique that extends a conventional coherence mechanism a...
Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system ...
We argue that OS-provided data coherence on non-cache-coherent NUMA multiprocessors (machines with a...
Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache c...
Previous work in scalable hardware distributed shared memory (DSM) multiprocessors has established t...
Previous work in scalable hardware distributed shared memory (DSM) multiprocessors has established t...
The prevailing use of multicores in Embedded Critical Systems (ECS) is multi-application workloads i...
Caches have the potential to provide multiprocessors with an automatic mechanism for reducing both n...
textThis dissertation explores techniques for reducing the costs of inter-processor communication i...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 1993. Simultaneously published ...
In large scale machines, thousands of processor cycles, in other words, missed opportunities to issu...
Across a broad range of applications, multicore technol-ogy is the most important factor that drives...
The quest to improve performance forces designers to explore finer-grained multiprocessor machines. ...
In a shared-memory multiprocessor with private caches, cached copies of a data item must be kept con...