Abstract—Cache hierarchies are increasingly non-uniform, so for systems to scale efficiently, data must be close to the threads that use it. Moreover, cache capacity is limited and contended among threads, introducing complex capacity/latency tradeoffs. Prior NUCA schemes have focused on managing data to reduce access latency, but have ignored thread placement; and applying prior NUMA thread placement schemes to NUCA is inefficient, as capacity, not bandwidth, is the main constraint. We present CDCS, a technique to jointly place threads and data in multicores with distributed shared caches. We develop novel monitoring hardware that enables fine-grained space al-location on large caches, and data movement support to allow frequent full-chip ...
Journal ArticleIn future multi-cores, large amounts of delay and power will be spent accessing data...
AbstractIn current multi-core systems with the shared last level cache (LLC) physically distributed ...
The design of the memory hierarchy in a multi-core architecture is a critical component since it mus...
Cache hierarchies are increasingly non-uniform, so for systems to scale efficiently, data must be cl...
Chip multiprocessors have the potential to exploit thread level parallelism, particularly attractive...
Chip-multiprocessors (CMPs) have become the mainstream chip design in recent years; for scalability ...
A modern high-performance multi-core processor has large shared cache memories. However, simultaneou...
This paper presents Cooperative Cache Partitioning (CCP) to allocate cache resources among threads c...
The last level on-chip cache (LLC) is becoming bigger and more complex to effectively support the va...
The effectiveness of the last-level shared cache is crucial to the performance of a multi-core syste...
Abstract—The emergence of multi-core systems opens new opportunities for thread-level parallelism an...
Current architectural trends of rising on-chip core counts and worsening power-performance penalties...
Wire delays continue to grow as the dominant component of latency for large caches. A recent work pr...
At the level of multi-core processors that share the same cache, data sharing among threads which be...
Increases in on-chip communication delay and the large working sets of server and scientific workloa...
Journal ArticleIn future multi-cores, large amounts of delay and power will be spent accessing data...
AbstractIn current multi-core systems with the shared last level cache (LLC) physically distributed ...
The design of the memory hierarchy in a multi-core architecture is a critical component since it mus...
Cache hierarchies are increasingly non-uniform, so for systems to scale efficiently, data must be cl...
Chip multiprocessors have the potential to exploit thread level parallelism, particularly attractive...
Chip-multiprocessors (CMPs) have become the mainstream chip design in recent years; for scalability ...
A modern high-performance multi-core processor has large shared cache memories. However, simultaneou...
This paper presents Cooperative Cache Partitioning (CCP) to allocate cache resources among threads c...
The last level on-chip cache (LLC) is becoming bigger and more complex to effectively support the va...
The effectiveness of the last-level shared cache is crucial to the performance of a multi-core syste...
Abstract—The emergence of multi-core systems opens new opportunities for thread-level parallelism an...
Current architectural trends of rising on-chip core counts and worsening power-performance penalties...
Wire delays continue to grow as the dominant component of latency for large caches. A recent work pr...
At the level of multi-core processors that share the same cache, data sharing among threads which be...
Increases in on-chip communication delay and the large working sets of server and scientific workloa...
Journal ArticleIn future multi-cores, large amounts of delay and power will be spent accessing data...
AbstractIn current multi-core systems with the shared last level cache (LLC) physically distributed ...
The design of the memory hierarchy in a multi-core architecture is a critical component since it mus...