Chip-multiprocessors (CMPs) have become the mainstream chip design in recent years; for scalability reasons, designs with high core counts tend towards tiled CMPs with physically distributed shared caches. This naturally leads to a Non-Uniform Cache Architecture (NUCA) design, where on chip access latencies depend on the physical distances between requesting cores and home cores where the data is cached. Improving data locality is thus key to performance, and several studies have addressed this problem using data replication and data migration. In this paper, we consider another mechanism, hardware level thread migration. This approach, we argue, can better exploit shared data locality for NUCA designs by effectively replacing multiple roun...
Locality has always been a critical factor in on-chip data placement on CMPs as accessing further-aw...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
Increases in on-chip communication delay and the large working sets of server and scientific workloa...
For certain applications involving chip multiprocessors with more than 16 cores, a directoryless arc...
Chip multiprocessors have the potential to exploit thread level parallelism, particularly attractive...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
Cache hierarchies are increasingly non-uniform, so for systems to scale efficiently, data must be cl...
Microprocessor industry has converged on chip multiprocessor (CMP) as the architecture of choice to ...
The design of the memory hierarchy in a multi-core architecture is a critical component since it mus...
AbstractIn current multi-core systems with the shared last level cache (LLC) physically distributed ...
Future CMPs will have more cores and greater onchip cache capacity. The on-chip cache can either be ...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
We introduce the Execution Migration Machine (EM²), a novel data-centric multicore memory system arc...
Locality has always been a critical factor in on-chip data placement on CMPs as accessing further-aw...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
Increases in on-chip communication delay and the large working sets of server and scientific workloa...
For certain applications involving chip multiprocessors with more than 16 cores, a directoryless arc...
Chip multiprocessors have the potential to exploit thread level parallelism, particularly attractive...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
Cache hierarchies are increasingly non-uniform, so for systems to scale efficiently, data must be cl...
Microprocessor industry has converged on chip multiprocessor (CMP) as the architecture of choice to ...
The design of the memory hierarchy in a multi-core architecture is a critical component since it mus...
AbstractIn current multi-core systems with the shared last level cache (LLC) physically distributed ...
Future CMPs will have more cores and greater onchip cache capacity. The on-chip cache can either be ...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
We introduce the Execution Migration Machine (EM²), a novel data-centric multicore memory system arc...
Locality has always been a critical factor in on-chip data placement on CMPs as accessing further-aw...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
Increases in on-chip communication delay and the large working sets of server and scientific workloa...