Future CMPs will have more cores and greater onchip cache capacity. The on-chip cache can either be divided into separate private L2 caches for each core, or treated as a large shared L2 cache. Private caches provide low hit latency but low capacity, while shared caches have higher hit latencies but greater capacity. Victim replication was previously introduced as a way of reducing the average hit latency of a shared cache by allowing a processor to make a replica of a primary cache victim in its local slice of the global L2 cache. Although victim replication performs well on multithreaded and single-threaded codes, it performs worse than the private scheme for multiprogrammed workloads where there is little sharing between the different pr...
We introduce the Execution Migration Machine (EM²), a novel data-centric multicore memory system arc...
For certain applications involving chip multiprocessors with more than 16 cores, a directoryless arc...
One of the key requirements to obtaining high performance from chip multiprocessors (CMPs) is to eff...
Future CMPs will have more cores and greater onchip cache capacity. The on-chip cache can either be ...
The large working sets of conmercial and scientific workloads stress the L2 caches of Chip Multiproc...
Chip multiprocessors (CMPs) substantially increase capacity pressure on the on-chip memory hierarchy...
Microprocessor industry has converged on chip multiprocessor (CMP) as the architecture of choice to ...
CMPs are now in common use. Increasing core counts implies increasing demands for instruction and da...
Chip-multiprocessors (CMPs) have become the mainstream chip design in recent years; for scalability ...
CMPs are now in common use. Increasing core counts implies increasing demands for instruction and da...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
Abstract — Performance tradeoffs between fast data access by local data replication and cache capaci...
Abstract—Multi-threaded applications execute their threads on different cores with their own local c...
The design of the memory hierarchy in a multi-core architecture is a critical component since it mus...
This paper describes Constrained Associative-Mapping-of-Tracking-Entries (C-AMTE), a scalable mechan...
We introduce the Execution Migration Machine (EM²), a novel data-centric multicore memory system arc...
For certain applications involving chip multiprocessors with more than 16 cores, a directoryless arc...
One of the key requirements to obtaining high performance from chip multiprocessors (CMPs) is to eff...
Future CMPs will have more cores and greater onchip cache capacity. The on-chip cache can either be ...
The large working sets of conmercial and scientific workloads stress the L2 caches of Chip Multiproc...
Chip multiprocessors (CMPs) substantially increase capacity pressure on the on-chip memory hierarchy...
Microprocessor industry has converged on chip multiprocessor (CMP) as the architecture of choice to ...
CMPs are now in common use. Increasing core counts implies increasing demands for instruction and da...
Chip-multiprocessors (CMPs) have become the mainstream chip design in recent years; for scalability ...
CMPs are now in common use. Increasing core counts implies increasing demands for instruction and da...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
Abstract — Performance tradeoffs between fast data access by local data replication and cache capaci...
Abstract—Multi-threaded applications execute their threads on different cores with their own local c...
The design of the memory hierarchy in a multi-core architecture is a critical component since it mus...
This paper describes Constrained Associative-Mapping-of-Tracking-Entries (C-AMTE), a scalable mechan...
We introduce the Execution Migration Machine (EM²), a novel data-centric multicore memory system arc...
For certain applications involving chip multiprocessors with more than 16 cores, a directoryless arc...
One of the key requirements to obtaining high performance from chip multiprocessors (CMPs) is to eff...