Cache hierarchies are increasingly non-uniform, so for systems to scale efficiently, data must be close to the threads that use it. Moreover, cache capacity is limited and contended among threads, introducing complex capacity/latency tradeoffs. Prior NUCA schemes have focused on managing data to reduce access latency, but have ignored thread placement; and applying prior NUMA thread placement schemes to NUCA is inefficient, as capacity, not bandwidth, is the main constraint. We present CDCS, a technique to jointly place threads and data in multicores with distributed shared caches. We develop novel monitoring hardware that enables fine-grained space allocation on large caches, and data movement support to allow frequent full-chip reconfigur...
Current architectural trends of rising on-chip core counts and worsening power-performance penalties...
The scaling of semiconductor technologies is leading to processors with increasing numbers of cores....
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Abstract—Cache hierarchies are increasingly non-uniform, so for systems to scale efficiently, data m...
Chip-multiprocessors (CMPs) have become the mainstream chip design in recent years; for scalability ...
Increases in on-chip communication delay and the large working sets of server and scientific workloa...
Shared last-level caches, widely used in chip-multi-processors (CMPs), face two fundamental limitati...
AbstractIn current multi-core systems with the shared last level cache (LLC) physically distributed ...
Journal ArticleIn future multi-cores, large amounts of delay and power will be spent accessing data...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
The last level on-chip cache (LLC) is becoming bigger and more complex to effectively support the va...
As transistor density continues to grow geometrically, processor manufacturers are already able to p...
Wire delays continue to grow as the dominant component of latency for large caches. A recent work pr...
Chip multiprocessors have the potential to exploit thread level parallelism, particularly attractive...
The effectiveness of the last-level shared cache is crucial to the performance of a multi-core syste...
Current architectural trends of rising on-chip core counts and worsening power-performance penalties...
The scaling of semiconductor technologies is leading to processors with increasing numbers of cores....
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...
Abstract—Cache hierarchies are increasingly non-uniform, so for systems to scale efficiently, data m...
Chip-multiprocessors (CMPs) have become the mainstream chip design in recent years; for scalability ...
Increases in on-chip communication delay and the large working sets of server and scientific workloa...
Shared last-level caches, widely used in chip-multi-processors (CMPs), face two fundamental limitati...
AbstractIn current multi-core systems with the shared last level cache (LLC) physically distributed ...
Journal ArticleIn future multi-cores, large amounts of delay and power will be spent accessing data...
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family ...
The last level on-chip cache (LLC) is becoming bigger and more complex to effectively support the va...
As transistor density continues to grow geometrically, processor manufacturers are already able to p...
Wire delays continue to grow as the dominant component of latency for large caches. A recent work pr...
Chip multiprocessors have the potential to exploit thread level parallelism, particularly attractive...
The effectiveness of the last-level shared cache is crucial to the performance of a multi-core syste...
Current architectural trends of rising on-chip core counts and worsening power-performance penalties...
The scaling of semiconductor technologies is leading to processors with increasing numbers of cores....
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provid...