With rapidly increasing parallelism, DRAM performance and power have surfaced as primary constraints from consumer electronics to high performance computing (HPC) for a variety of applications, including bulk-synchronous data-parallel applications which are key drivers for multi-core, with examples including image processing, climate modeling, physics simulation, gaming, face recognition, and many others. We present the last-level collective prefetcher (LLCP), a purely hardware last-level cache (LLC) prefetcher that exploits the highly correlated prefetch patterns of data-parallel algorithms that would otherwise not be recognized by a prefetcher that is oblivious to data parallelism. LLCP generates prefetches on behalf of multiple cores in ...
This paper presents cooperative prefetching and caching — the use of network-wide global resources (...
Abstract—A single parallel application running on a multi-core system shows sub-linear speedup becau...
Data prefetching has been considered an effective way to mask data access latency caused by cache mi...
With rapidly increasing parallelism, DRAM performance and power have surfaced as primary constraints...
With off-chip memory access taking 100's of processor cycles, getting data to the processor in a tim...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
International audienceIn multi-core systems, an application's prefetcher can interfere with the memo...
Despite large caches, main-memory access latencies still cause significant performance losses in man...
Data prefetching is an effective technique to hide memory latency and thus bridge the increasing pro...
[EN] Current multicore systems implement multiple hardware prefetchers to tolerate long main memory ...
We have studied DRAM-level prefetching for the fully buffered DIMM (FB-DIMM) designed for multi-core...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
Analysis and simulation of data prefetching algorithms for last-level cache memory. Analysis and com...
Data prefetching has been considered an effective way to cross the performance gap between processor...
Abstract. Given the increasing gap between processors and memory, prefetching data into cache become...
This paper presents cooperative prefetching and caching — the use of network-wide global resources (...
Abstract—A single parallel application running on a multi-core system shows sub-linear speedup becau...
Data prefetching has been considered an effective way to mask data access latency caused by cache mi...
With rapidly increasing parallelism, DRAM performance and power have surfaced as primary constraints...
With off-chip memory access taking 100's of processor cycles, getting data to the processor in a tim...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
International audienceIn multi-core systems, an application's prefetcher can interfere with the memo...
Despite large caches, main-memory access latencies still cause significant performance losses in man...
Data prefetching is an effective technique to hide memory latency and thus bridge the increasing pro...
[EN] Current multicore systems implement multiple hardware prefetchers to tolerate long main memory ...
We have studied DRAM-level prefetching for the fully buffered DIMM (FB-DIMM) designed for multi-core...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
Analysis and simulation of data prefetching algorithms for last-level cache memory. Analysis and com...
Data prefetching has been considered an effective way to cross the performance gap between processor...
Abstract. Given the increasing gap between processors and memory, prefetching data into cache become...
This paper presents cooperative prefetching and caching — the use of network-wide global resources (...
Abstract—A single parallel application running on a multi-core system shows sub-linear speedup becau...
Data prefetching has been considered an effective way to mask data access latency caused by cache mi...