Abstract—Although shared memory programming models show good programmability compared to message passing pro-gramming models, their implementation by page-based software distributed shared memory systems usually suffers from high memory consistency costs. The major part of these costs is inter-node data transfer for keeping virtual shared memory consistent. A good prefetch strategy can reduce this cost. We develop two prefetch techniques, TReP and HReP, which are based on the execution history of each parallel region. These techniques are evaluated using offline simulations with the NAS Parallel Benchmarks and the LINPACK benchmark. On average, TReP achieves an efficiency (ratio of pages prefetched that were subsequently accessed) of 96 % a...
This dissertation considers the use of data prefetching and an alternative mechanism, data forwardin...
AbstractPrefetch engines working on distributed memory systems behave independently by analyzing the...
Memory access latency is the primary performance bottle-neck in modern computer systems. Prefetching...
Although shared memory programming models show good programmability compared to message passing prog...
this paper, we examine the way in which prefetching can exploit parallelism. Prefetching has been st...
grantor: University of TorontoA key obstacle to achieving high performance on software dis...
Despite large caches, main-memory access latencies still cause significant performance losses in man...
Memory latency has always been a major issue in shared-memory multiprocessors and high-speed systems...
Abstract A major overhead in software DSM (Distributed Shared Memory) is the cost of remote memory a...
Data prefetching is an effective technique to hide memory latency and thus bridge the increasing pro...
textModern computer systems spend a substantial fraction of their running time waiting for data from...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
This paper presents cooperative prefetching and caching — the use of network-wide global resources (...
Current microprocessors aggressively exploit instruction-level parallelism (ILP) through techniques ...
Abstract—In this paper, we present an informed prefetching technique called IPODS that makes use of ...
This dissertation considers the use of data prefetching and an alternative mechanism, data forwardin...
AbstractPrefetch engines working on distributed memory systems behave independently by analyzing the...
Memory access latency is the primary performance bottle-neck in modern computer systems. Prefetching...
Although shared memory programming models show good programmability compared to message passing prog...
this paper, we examine the way in which prefetching can exploit parallelism. Prefetching has been st...
grantor: University of TorontoA key obstacle to achieving high performance on software dis...
Despite large caches, main-memory access latencies still cause significant performance losses in man...
Memory latency has always been a major issue in shared-memory multiprocessors and high-speed systems...
Abstract A major overhead in software DSM (Distributed Shared Memory) is the cost of remote memory a...
Data prefetching is an effective technique to hide memory latency and thus bridge the increasing pro...
textModern computer systems spend a substantial fraction of their running time waiting for data from...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
This paper presents cooperative prefetching and caching — the use of network-wide global resources (...
Current microprocessors aggressively exploit instruction-level parallelism (ILP) through techniques ...
Abstract—In this paper, we present an informed prefetching technique called IPODS that makes use of ...
This dissertation considers the use of data prefetching and an alternative mechanism, data forwardin...
AbstractPrefetch engines working on distributed memory systems behave independently by analyzing the...
Memory access latency is the primary performance bottle-neck in modern computer systems. Prefetching...