Coherence misses in shared-memory multiprocessors account for a substantial fraction of execution time in many important scientific and commercial workloads. Memory streaming provides a promising solution to the coherence miss bottleneck because it improves memory level parallelism and lookahead while using on-chip resources efficiently. We observe that the order in which shared data are consumed by one processor is correlated to the order in which they were produced by another. We investigate this phenomenon and demonstrate that it can be exploited to send Store-ORDered Streams (SORDS) of shared data from producers to consumers, thereby eliminating coherent read misses. Using a trace-driven analysis of all user and OS memory references in ...
Directory-based cache coherence protocol is accepted as the common technique in large scale shared m...
: Virtual memory based cache coherence is a mechanism that relies only on hardware that already exi...
Emerging multiprocessor architectures such as chip multiprocessors, embedded architectures, and mas...
Coherence misses in shared-memory multiprocessors account for a substantial fraction of execution ti...
Coherent read misses in shared-memory multiprocessors account for a substantial fraction of executio...
Coherent read misses in shared-memory multiprocessors account for a substantial fraction of executio...
We argue that OS-provided data coherence on non-cache-coherent NUMA multiprocessors (machines with a...
One common cause of poor performance in large-scale shared-memory multiprocessors is limited memory ...
High-end computing increasingly relies on shared-memory multiprocessors (SMPs), such as clusters of ...
New generation System-on-Chips will be extremely complex devices, composed from complex subsystems, ...
Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache c...
To maintain coherence in conventional shared-memory multiprocessor systems, processors first check o...
An adaptive cache coherence mechanism exploits semantic information about the expected or observed a...
During the last few years many different memory consistency protocols have been proposed. These rang...
Coherence induced cache misses are an important aspect limiting the scalability of shared memory par...
Directory-based cache coherence protocol is accepted as the common technique in large scale shared m...
: Virtual memory based cache coherence is a mechanism that relies only on hardware that already exi...
Emerging multiprocessor architectures such as chip multiprocessors, embedded architectures, and mas...
Coherence misses in shared-memory multiprocessors account for a substantial fraction of execution ti...
Coherent read misses in shared-memory multiprocessors account for a substantial fraction of executio...
Coherent read misses in shared-memory multiprocessors account for a substantial fraction of executio...
We argue that OS-provided data coherence on non-cache-coherent NUMA multiprocessors (machines with a...
One common cause of poor performance in large-scale shared-memory multiprocessors is limited memory ...
High-end computing increasingly relies on shared-memory multiprocessors (SMPs), such as clusters of ...
New generation System-on-Chips will be extremely complex devices, composed from complex subsystems, ...
Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache c...
To maintain coherence in conventional shared-memory multiprocessor systems, processors first check o...
An adaptive cache coherence mechanism exploits semantic information about the expected or observed a...
During the last few years many different memory consistency protocols have been proposed. These rang...
Coherence induced cache misses are an important aspect limiting the scalability of shared memory par...
Directory-based cache coherence protocol is accepted as the common technique in large scale shared m...
: Virtual memory based cache coherence is a mechanism that relies only on hardware that already exi...
Emerging multiprocessor architectures such as chip multiprocessors, embedded architectures, and mas...