The gap between CPU speed and memory speed in modern computer systems is widening as new generations of hardware are introduced. Loop blocking and prefetching transformations help bridge this gap for regular applications; however, these techniques don't deal well with irregular applications. This paper investigates using data and computation reordering strategies to improve memory hierarchy utilization for irregular applications on systems with multi-level memory hierarchies. We introduce multi-level blocking as a new computation reordering strategy and present novel integrations of computation and data reordering using space-filling curves. In experiments that applied a combination of data and computation reorderings to two irregular progr...
We demonstrate that data reordering can substantially improve the performance of fine-grained irregu...
Irregular applications frequently exhibit poor performance on contemporary computer architectures, i...
Programming languages that provide multidimensional arrays and a flat linear model of memory must im...
The gap between CPU speed and memory speed in modern com-puter systems is widening as new generation...
The trend in high-performance microprocessor design is toward increasing computational power on the ...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
textModern computer systems spend a substantial fraction of their running time waiting for data from...
Irregular applications frequently exhibit poor performance on contemporary computer architectures, i...
While many parallel applications exhibit good spatial locality, other important codes in areas like ...
The large latency of memory accesses in modern computers is a key obstacle in achieving high process...
We present a simple and novel framework for generating blocked codes for high-performance machines w...
In order to mitigate the impact of the constantly widening gap between processor speed and main memo...
"What Mathematics is to Physics, Data traversal is to High-performance computing." The world of Comp...
In computer systems, latency tolerance is the use of concurrency to achieve high performance in spit...
In this paper, we propose a novel loop scheduling technique based on multi-dimensional retiming in a...
We demonstrate that data reordering can substantially improve the performance of fine-grained irregu...
Irregular applications frequently exhibit poor performance on contemporary computer architectures, i...
Programming languages that provide multidimensional arrays and a flat linear model of memory must im...
The gap between CPU speed and memory speed in modern com-puter systems is widening as new generation...
The trend in high-performance microprocessor design is toward increasing computational power on the ...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
textModern computer systems spend a substantial fraction of their running time waiting for data from...
Irregular applications frequently exhibit poor performance on contemporary computer architectures, i...
While many parallel applications exhibit good spatial locality, other important codes in areas like ...
The large latency of memory accesses in modern computers is a key obstacle in achieving high process...
We present a simple and novel framework for generating blocked codes for high-performance machines w...
In order to mitigate the impact of the constantly widening gap between processor speed and main memo...
"What Mathematics is to Physics, Data traversal is to High-performance computing." The world of Comp...
In computer systems, latency tolerance is the use of concurrency to achieve high performance in spit...
In this paper, we propose a novel loop scheduling technique based on multi-dimensional retiming in a...
We demonstrate that data reordering can substantially improve the performance of fine-grained irregu...
Irregular applications frequently exhibit poor performance on contemporary computer architectures, i...
Programming languages that provide multidimensional arrays and a flat linear model of memory must im...