Although shared memory programming models show good programmability compared to message passing programming models, their implementation by page-based software distributed shared memory systems usually suffers from high memory consistency costs. The major part of these costs is internode data transfer for keeping virtual shared memory consistent. A good prefetch strategy can reduce this cost. We develop two prefetch techniques, TReP and HReP, which are based on the execution history of each parallel region. These techniques are evaluated using offline simulations with the NAS Parallel Benchmarks and the LINPACK benchmark. On average, TReP achieves an efficiency (ratio of pages prefetched that were subsequently accessed) of 96% and a coverag...
This paper presents cooperative prefetching and caching — the use of network-wide global resources (...
Abstract—In this paper, we present an informed prefetching technique called IPODS that makes use of ...
Memory access latency is the primary performance bottle-neck in modern computer systems. Prefetching...
Abstract—Although shared memory programming models show good programmability compared to message pas...
this paper, we examine the way in which prefetching can exploit parallelism. Prefetching has been st...
grantor: University of TorontoA key obstacle to achieving high performance on software dis...
Despite large caches, main-memory access latencies still cause significant performance losses in man...
Abstract A major overhead in software DSM (Distributed Shared Memory) is the cost of remote memory a...
Memory latency has always been a major issue in shared-memory multiprocessors and high-speed systems...
Data prefetching is an effective technique to hide memory latency and thus bridge the increasing pro...
Current microprocessors aggressively exploit instruction-level parallelism (ILP) through techniques ...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
textModern computer systems spend a substantial fraction of their running time waiting for data from...
Processor performance has increased far faster than memories have been able to keep up with, forcing...
Memory access latency is the primary performance bottle-neck in modern computer systems. Prefetching...
This paper presents cooperative prefetching and caching — the use of network-wide global resources (...
Abstract—In this paper, we present an informed prefetching technique called IPODS that makes use of ...
Memory access latency is the primary performance bottle-neck in modern computer systems. Prefetching...
Abstract—Although shared memory programming models show good programmability compared to message pas...
this paper, we examine the way in which prefetching can exploit parallelism. Prefetching has been st...
grantor: University of TorontoA key obstacle to achieving high performance on software dis...
Despite large caches, main-memory access latencies still cause significant performance losses in man...
Abstract A major overhead in software DSM (Distributed Shared Memory) is the cost of remote memory a...
Memory latency has always been a major issue in shared-memory multiprocessors and high-speed systems...
Data prefetching is an effective technique to hide memory latency and thus bridge the increasing pro...
Current microprocessors aggressively exploit instruction-level parallelism (ILP) through techniques ...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
textModern computer systems spend a substantial fraction of their running time waiting for data from...
Processor performance has increased far faster than memories have been able to keep up with, forcing...
Memory access latency is the primary performance bottle-neck in modern computer systems. Prefetching...
This paper presents cooperative prefetching and caching — the use of network-wide global resources (...
Abstract—In this paper, we present an informed prefetching technique called IPODS that makes use of ...
Memory access latency is the primary performance bottle-neck in modern computer systems. Prefetching...