Cache working-set adaptation is key as embedded systems move to multiprocessor and Simultaneous Multithreaded Architectures (SMT) because interthread pollution harms system performance and battery life. Light-Power NUCA (LP-NUCA) is a working-set adaptive cache that depends on temporal-locality to save energy. This work identifies the sources of energy waste in LP-NUCAs: parallel access to the tag and data arrays of the tiles and low locality phases with useless block migration. To counteract both issues, we prove that switching to serial access reduces energy without harming performance and propose a machine learning Adaptive Drop Rate (ADR) controller that minimizes the amount of replacement and migration when locality is low. This work ...
D-NUCA L2 caches are able to tolerate the increasing wire delay effects due to technology scaling th...
AbstractIn current multi-core systems with the shared last level cache (LLC) physically distributed ...
Abstract—High-end embedded processors demand complex on-chip cache hierarchies satisfying several co...
Portable devices often demand powerful processors to run computing intensive applications, such as v...
The number of processor cores and on-chip cache size has been increasing on chip multiprocessors (CM...
The number of processor cores and on-chip cache size has been increasing on chip multiprocessors (CM...
Journal ArticleModern processors dedicate more than half their chip area to large L2 and L3 caches ...
The Last Level Cache (LLC) is a key element to improve application performance in multi-cores. To ha...
D-NUCA caches are cache memories that, thanks to banked organization, broadcast search and promotion...
Although multi-threading processors can increase the performance of embedded systems with a minimum ...
The increasing speed-gap between processor and memory and the limited memory bandwidth make last-lev...
Wire delays and leakage energy consumption are both growing problems in the design of large on chip ...
ABSTRACT NUCA caches are large L2 on-chip cache memories characterized by multi-bank partitioning a...
Wire delays continue to grow as the dominant component of latency for large caches. A recent work pr...
D-NUCA caches are cache memories that, thanks to banked organization, broadcast search and promoti...
D-NUCA L2 caches are able to tolerate the increasing wire delay effects due to technology scaling th...
AbstractIn current multi-core systems with the shared last level cache (LLC) physically distributed ...
Abstract—High-end embedded processors demand complex on-chip cache hierarchies satisfying several co...
Portable devices often demand powerful processors to run computing intensive applications, such as v...
The number of processor cores and on-chip cache size has been increasing on chip multiprocessors (CM...
The number of processor cores and on-chip cache size has been increasing on chip multiprocessors (CM...
Journal ArticleModern processors dedicate more than half their chip area to large L2 and L3 caches ...
The Last Level Cache (LLC) is a key element to improve application performance in multi-cores. To ha...
D-NUCA caches are cache memories that, thanks to banked organization, broadcast search and promotion...
Although multi-threading processors can increase the performance of embedded systems with a minimum ...
The increasing speed-gap between processor and memory and the limited memory bandwidth make last-lev...
Wire delays and leakage energy consumption are both growing problems in the design of large on chip ...
ABSTRACT NUCA caches are large L2 on-chip cache memories characterized by multi-bank partitioning a...
Wire delays continue to grow as the dominant component of latency for large caches. A recent work pr...
D-NUCA caches are cache memories that, thanks to banked organization, broadcast search and promoti...
D-NUCA L2 caches are able to tolerate the increasing wire delay effects due to technology scaling th...
AbstractIn current multi-core systems with the shared last level cache (LLC) physically distributed ...
Abstract—High-end embedded processors demand complex on-chip cache hierarchies satisfying several co...