We introduce the Execution Migration Machine (EM2), a novel, scalable shared-memory architecture for large-scale multicores constrained by off-chip memory bandwidth. EM2 reduces cache miss rates, and consequently off-chip memory usage, by permitting only one copy of data to be stored anywhere in the system: when a thread wishes to access an address not locally cached on the core it is executing on, it migrates to the appropriate core and continues execution. Using detailed simulations of a range of 256-core configurations on the SPLASH-2 benchmark suite, we show that EM2 improves application completion times by 18% on the average while remaining competitive with traditional architectures in silicon area
abstract: As the number of cores per chip increases, maintaining cache coherence becomes prohibitive...
With increasing core counts in modern multi-core designs, the overhead of synchronization jeopardize...
Designing scalable transaction processing systems on modern hardware has been a challenge for almost...
We introduce the Execution Migration Machine (EM²), a novel data-centric multicore memory system arc...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
Driven by increasingly unbalanced technology scaling and power dissipation limits, microprocessor d...
For certain applications involving chip multiprocessors with more than 16 cores, a directoryless arc...
As transistor density continues to grow geometrically, processor manufacturers are already able to p...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
Multicore chips will have large amounts of fast on-chip cache memory, along with relatively slow DRA...
Chip multiprocessors have the potential to exploit thread level parallelism, particularly attractive...
With the advancement of design and fabrication of high-performance integrated circuits technology, i...
Single chip multicore processors are now prevalent and processors with hundreds of cores are being p...
The rising core count per processor is pushing chip complexity to a level that hardware-based cache...
Chip-multiprocessors (CMPs) have become the mainstream chip design in recent years; for scalability ...
abstract: As the number of cores per chip increases, maintaining cache coherence becomes prohibitive...
With increasing core counts in modern multi-core designs, the overhead of synchronization jeopardize...
Designing scalable transaction processing systems on modern hardware has been a challenge for almost...
We introduce the Execution Migration Machine (EM²), a novel data-centric multicore memory system arc...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
Driven by increasingly unbalanced technology scaling and power dissipation limits, microprocessor d...
For certain applications involving chip multiprocessors with more than 16 cores, a directoryless arc...
As transistor density continues to grow geometrically, processor manufacturers are already able to p...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
Multicore chips will have large amounts of fast on-chip cache memory, along with relatively slow DRA...
Chip multiprocessors have the potential to exploit thread level parallelism, particularly attractive...
With the advancement of design and fabrication of high-performance integrated circuits technology, i...
Single chip multicore processors are now prevalent and processors with hundreds of cores are being p...
The rising core count per processor is pushing chip complexity to a level that hardware-based cache...
Chip-multiprocessors (CMPs) have become the mainstream chip design in recent years; for scalability ...
abstract: As the number of cores per chip increases, maintaining cache coherence becomes prohibitive...
With increasing core counts in modern multi-core designs, the overhead of synchronization jeopardize...
Designing scalable transaction processing systems on modern hardware has been a challenge for almost...