We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family of architec-tures. Migration-based architectures move threads among cores to guarantee sequential semantics in large multicores. Using a execution migration (EM) architecture, we achieve performance comparable to directory-based architectures without using directories: avoiding automatic data repli-cation significantly reduces cache miss rates, while a fast network-level thread migration scheme takes advantage of shared data locality to reduce remote cache accesses that limit traditional NUCA performance. EM area and energy consumption are very competi-tive, and, on the average, it outperforms a directory-based MOESI baseline by 1.3 × and a t...
We present design details and some initial performance results of a novel scalable shared memory mul...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Abstract—Large-scale cache-coherent systems often impose unnecessary overhead on data that is thread...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
For certain applications involving chip multiprocessors with more than 16 cores, a directoryless arc...
We introduce the Execution Migration Machine (EM²), a novel data-centric multicore memory system arc...
Chip-multiprocessors (CMPs) have become the mainstream chip design in recent years; for scalability ...
Designing an efficient memory system is a big challenge for future multicore systems. In particular,...
We introduce the Execution Migration Machine (EM2), a novel, scalable shared-memory architecture for...
Designing an efficient memory system is a big challenge for future multicore systems. In particular,...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Increasing on-chip wire delay and growing off-chip miss latency, present two key challenges in desig...
Abstract—Cache hierarchies are increasingly non-uniform, so for systems to scale efficiently, data m...
Shared memory systems are becoming increasingly complex as they typically integrate several storage ...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
We present design details and some initial performance results of a novel scalable shared memory mul...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Abstract—Large-scale cache-coherent systems often impose unnecessary overhead on data that is thread...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
For certain applications involving chip multiprocessors with more than 16 cores, a directoryless arc...
We introduce the Execution Migration Machine (EM²), a novel data-centric multicore memory system arc...
Chip-multiprocessors (CMPs) have become the mainstream chip design in recent years; for scalability ...
Designing an efficient memory system is a big challenge for future multicore systems. In particular,...
We introduce the Execution Migration Machine (EM2), a novel, scalable shared-memory architecture for...
Designing an efficient memory system is a big challenge for future multicore systems. In particular,...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Increasing on-chip wire delay and growing off-chip miss latency, present two key challenges in desig...
Abstract—Cache hierarchies are increasingly non-uniform, so for systems to scale efficiently, data m...
Shared memory systems are becoming increasingly complex as they typically integrate several storage ...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
We present design details and some initial performance results of a novel scalable shared memory mul...
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Abstract—Large-scale cache-coherent systems often impose unnecessary overhead on data that is thread...