I/O prefetching has been employed in the past as one of the mech- anisms to hide large disk latencies. However, I/O prefetching in parallel applications is problematic when multiple CPUs share the same set of disks due to the possibility that prefetches from different CPUs can interact on shared memory caches in the I/O nodes in complex and unpredictable ways. In this paper, we (i) quantify the impact of compiler-directed I/O prefetching - developed originally in the context of sequential execution - on shared caches at I/O nodes. The experimental data collected shows that while I/O prefetching brings benefits, its effectiveness reduces significantly as the number of CPUs is increased; (ii) identify inter-CPU misses due to harmful prefetche...
this paper, we examine the way in which prefetching can exploit parallelism. Prefetching has been st...
Abstract — Parallel I/O prefetching is considered to be effective in improving I/O performance. Howe...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...
Chip multiprocessors (CMPs) present a unique scenario for software data prefetching with subtle trad...
Memory latency has always been a major issue in shared-memory multiprocessors and high-speed systems...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Improvements in the processing speed of multiprocessors are outpacing improvements in the speed of d...
Compiler-directed cache prefetching has the poten-tial to hide much of the high memory latency seen ...
High-performance I/O systems depend on prefetching and caching in order to deliver good performance ...
Abstract—In this paper, we present an informed prefetching technique called IPODS that makes use of ...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
In parallel I/O systems the I/O buffer can be used to improve I/O parallelism by improving I/O laten...
AbstractMultiple-disk organizations can be used to improve the I/O performance of problems like exte...
Parallel applications can benefit greatly from massive computational capability, but their performan...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
this paper, we examine the way in which prefetching can exploit parallelism. Prefetching has been st...
Abstract — Parallel I/O prefetching is considered to be effective in improving I/O performance. Howe...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...
Chip multiprocessors (CMPs) present a unique scenario for software data prefetching with subtle trad...
Memory latency has always been a major issue in shared-memory multiprocessors and high-speed systems...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Improvements in the processing speed of multiprocessors are outpacing improvements in the speed of d...
Compiler-directed cache prefetching has the poten-tial to hide much of the high memory latency seen ...
High-performance I/O systems depend on prefetching and caching in order to deliver good performance ...
Abstract—In this paper, we present an informed prefetching technique called IPODS that makes use of ...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
In parallel I/O systems the I/O buffer can be used to improve I/O parallelism by improving I/O laten...
AbstractMultiple-disk organizations can be used to improve the I/O performance of problems like exte...
Parallel applications can benefit greatly from massive computational capability, but their performan...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
this paper, we examine the way in which prefetching can exploit parallelism. Prefetching has been st...
Abstract — Parallel I/O prefetching is considered to be effective in improving I/O performance. Howe...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...