This paper describes Cache Equalizer (CE), a novel distributed cache management scheme for large scale chip multiprocessors (CMPs). Our work is motivated by large asymmetry in cache sets usages. CE decouples the physical locations of cache blocks from their addresses for the sake of reducing misses caused by destructive interferences. Temporal pressure at the on-chip last-level cache, is continuously collected at a group (comprised of cache sets) granularity, and periodically recorded at the memory controller to guide the placement process. An incoming block is consequently placed at a cache group that exhibits the minimum pressure. CE provides Quality of Service (QoS) by robustly offering better performance than the baseline shared NUCA ca...
Microprocessor industry has converged on chip multiprocessor (CMP) as the architecture of choice to ...
Chip multiprocessors (CMPs) have become virtually ubiquitous due to the increasing impact of power a...
Chip multiprocessors have the potential to exploit thread level parallelism, particularly attractive...
This paper describes Cache Equalizer (CE), a novel distributed cache management scheme for large sca...
This paper describes Constrained Associative-Mapping-of-Tracking-Entries (C-AMTE), a scalable mechan...
As the momentum behind Chip Multi-Processors (CMPs) continues to grow, Last Level Cache (LLC) manage...
One of the key requirements to obtaining high performance from chip multiprocessors (CMPs) is to eff...
Increases in on-chip communication delay and the large working sets of server and scientific workloa...
Shared last-level caches, widely used in chip-multi-processors (CMPs), face two fundamental limitati...
The evolution of microprocessor design in the last few decades has changed significantly, moving fro...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
Abstract— Chip Multiprocessor (CMP) systems have become the reference architecture for designing mi...
One of the critical problems associated with emerging chip multiprocessors (CMPs) is the management ...
The number of processor cores and on-chip cache size has been increasing on chip multiprocessors (CM...
The number of processor cores and on-chip cache size has been increasing on chip multiprocessors (CM...
Microprocessor industry has converged on chip multiprocessor (CMP) as the architecture of choice to ...
Chip multiprocessors (CMPs) have become virtually ubiquitous due to the increasing impact of power a...
Chip multiprocessors have the potential to exploit thread level parallelism, particularly attractive...
This paper describes Cache Equalizer (CE), a novel distributed cache management scheme for large sca...
This paper describes Constrained Associative-Mapping-of-Tracking-Entries (C-AMTE), a scalable mechan...
As the momentum behind Chip Multi-Processors (CMPs) continues to grow, Last Level Cache (LLC) manage...
One of the key requirements to obtaining high performance from chip multiprocessors (CMPs) is to eff...
Increases in on-chip communication delay and the large working sets of server and scientific workloa...
Shared last-level caches, widely used in chip-multi-processors (CMPs), face two fundamental limitati...
The evolution of microprocessor design in the last few decades has changed significantly, moving fro...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
Abstract— Chip Multiprocessor (CMP) systems have become the reference architecture for designing mi...
One of the critical problems associated with emerging chip multiprocessors (CMPs) is the management ...
The number of processor cores and on-chip cache size has been increasing on chip multiprocessors (CM...
The number of processor cores and on-chip cache size has been increasing on chip multiprocessors (CM...
Microprocessor industry has converged on chip multiprocessor (CMP) as the architecture of choice to ...
Chip multiprocessors (CMPs) have become virtually ubiquitous due to the increasing impact of power a...
Chip multiprocessors have the potential to exploit thread level parallelism, particularly attractive...