This paper proposes and evaluates Sharing/Timing Adaptive Push (STAP), a dynamic scheme for preemptively sending data from producers to consumers to minimize criticalpath communication latency. STAP uses small hardware buffers to dynamically detect sharing patterns and timing requirements. The scheme applies to both intra-node and inter-socket directorybased shared memory networks. We integrate STAP into a MOESI cache-coherence protocol using heuristics to detect different data sharing patterns, including broadcasts, producer/consumer, and migratory-data sharing. Using 12 benchmarks from the PARSEC and SPLASH-2 suites in 3 different configurations, we show that our scheme significantly reduces communication latency in NUMA systems and achie...
Increasing on-chip wire delay and growing off-chip miss latency, present two key challenges in desig...
Distributed shared-memory systems provide scalable performance and a convenient model for parallel p...
Shared-memory multiprocessors are becoming increasingly popular as a high-performance, easy to progr...
textThis dissertation explores techniques for reducing the costs of inter-processor communication i...
This thesis presents a new cache coherence protocol for shared bus multicache systems, and addresses...
The transition to multi-core architectures can be attributed mainly to fundamental limitations in cl...
Abstract As the difference in speed between processor and memory system continues to increase, it is...
The goal of this work is to explore architectural mechanisms for supporting explicit communication...
Coherent read misses in shared-memory multiprocessors account for a substantial fraction of executio...
This dissertation considers the use of data prefetching and an alternative mechanism, data forwardin...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
As the level of parallelism in manycore processors keeps increasing, providing efficient mechanisms ...
Real-time systems are required to respond to their physical environment within predictable time. Whi...
Architects have adopted the shared memory model that implicitly manages cache coherence and cache ca...
A major challenge in multi-core real-time systems is the interference problem on the shared hardware...
Increasing on-chip wire delay and growing off-chip miss latency, present two key challenges in desig...
Distributed shared-memory systems provide scalable performance and a convenient model for parallel p...
Shared-memory multiprocessors are becoming increasingly popular as a high-performance, easy to progr...
textThis dissertation explores techniques for reducing the costs of inter-processor communication i...
This thesis presents a new cache coherence protocol for shared bus multicache systems, and addresses...
The transition to multi-core architectures can be attributed mainly to fundamental limitations in cl...
Abstract As the difference in speed between processor and memory system continues to increase, it is...
The goal of this work is to explore architectural mechanisms for supporting explicit communication...
Coherent read misses in shared-memory multiprocessors account for a substantial fraction of executio...
This dissertation considers the use of data prefetching and an alternative mechanism, data forwardin...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
As the level of parallelism in manycore processors keeps increasing, providing efficient mechanisms ...
Real-time systems are required to respond to their physical environment within predictable time. Whi...
Architects have adopted the shared memory model that implicitly manages cache coherence and cache ca...
A major challenge in multi-core real-time systems is the interference problem on the shared hardware...
Increasing on-chip wire delay and growing off-chip miss latency, present two key challenges in desig...
Distributed shared-memory systems provide scalable performance and a convenient model for parallel p...
Shared-memory multiprocessors are becoming increasingly popular as a high-performance, easy to progr...