Abstract—Large-scale cache-coherent systems often impose unnecessary overhead on data that is thread-private for the whole of its lifetime. These include resources devoted to tracking the coherence state of the data, as well as unnecessary coherence messages sent out over the interconnect. In this paper we show how the memory allocation strategy for non-uniform memory ac-cess (NUMA) systems can be exploited to remove any coherence-related traffic for thread-local data, as well removing the need to track those cache lines in sparse directories. Our strategy is to allocate directory state only on a miss from a node in a different affinity domain from the directory. We call this ALLocAte on Remote Miss, or ALLARM. Our solution is entirely back...
Recent research shows that the occupancy of the coherence controllers is a major performance bottlen...
Next generation multicore applications will process massive amounts of data with significant sharing...
Abstract—Cache hierarchies are increasingly non-uniform, so for systems to scale efficiently, data m...
A key challenge in architecting a multicore processor is efficiently maintaining cache coherence. Di...
As computing power has increased over the past few decades, science and engineering have found more ...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Conventional directory coherence operates at the finest granularity possible, that of a cache block....
With increasing core counts, the scalability of directory-based cache coherence has become a challen...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Shared memory systems are becoming increasingly complex as they typically integrate several storage ...
Shared memory provides an attractive and intuitive programming model that makes good use of programm...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
Caches enhance the performance of multiprocessors by reducing network traffic and average memory acc...
Recent research shows that the occupancy of the coherence controllers is a major performance bottlen...
Next generation multicore applications will process massive amounts of data with significant sharing...
Abstract—Cache hierarchies are increasingly non-uniform, so for systems to scale efficiently, data m...
A key challenge in architecting a multicore processor is efficiently maintaining cache coherence. Di...
As computing power has increased over the past few decades, science and engineering have found more ...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Conventional directory coherence operates at the finest granularity possible, that of a cache block....
With increasing core counts, the scalability of directory-based cache coherence has become a challen...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Shared memory systems are becoming increasingly complex as they typically integrate several storage ...
Shared memory provides an attractive and intuitive programming model that makes good use of programm...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
Caches enhance the performance of multiprocessors by reducing network traffic and average memory acc...
Recent research shows that the occupancy of the coherence controllers is a major performance bottlen...
Next generation multicore applications will process massive amounts of data with significant sharing...
Abstract—Cache hierarchies are increasingly non-uniform, so for systems to scale efficiently, data m...