Exploiting locality is critical to achieving good performance. For regular programs, which operate on dense arrays and matrices, techniques such as loop interchange and tiling have long been known to improve locality and deliver improved performance. However, there has been relatively little work investigating similar locality-improving transformations for irregular programs that operate on trees or graphs. Often, it is not even clear that such transformations are possible. In this paper, we discuss two transformations that can be applied to irregular programs that perform graph traversals. We show that these transformations can be seen as analogs of the popular regular transformations of loop interchange and tiling. We demonstrate the util...
An important class of scientific codes access memory in an irregular manner. Because irregular acce...
Global locality optimization is a technique for improving the cache performance of a sequence of loo...
This paper describes an algorithm to optimize cache locality in scientific codes on uniprocessor and...
Generally applicable techniques for improving locality in irregular programs, which operate over poi...
Many domains in computer science, from data-mining to graphics to computational astrophysics, focus ...
With the advent of programmer-friendly GPU computing environments, there has been much interest in o...
This is a post-peer-review, pre-copyedit version of an article published. The final authenticated ve...
Global locality optimization is a technique for improving the cache performance of a sequence of loo...
While there has been much work done on analyzing and transforming regular programs that operate over...
Tiling is a well-known loop transformation to improve temporal locality of nested loops. Current com...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/18...
The delivered performance on modern processors that employ deep memory hierarchies is closely relate...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
Over the past 20 years, increases in processor speed have dramatically outstripped performance incre...
AbstractÐExploiting locality of references has become extremely important in realizing the potential...
An important class of scientific codes access memory in an irregular manner. Because irregular acce...
Global locality optimization is a technique for improving the cache performance of a sequence of loo...
This paper describes an algorithm to optimize cache locality in scientific codes on uniprocessor and...
Generally applicable techniques for improving locality in irregular programs, which operate over poi...
Many domains in computer science, from data-mining to graphics to computational astrophysics, focus ...
With the advent of programmer-friendly GPU computing environments, there has been much interest in o...
This is a post-peer-review, pre-copyedit version of an article published. The final authenticated ve...
Global locality optimization is a technique for improving the cache performance of a sequence of loo...
While there has been much work done on analyzing and transforming regular programs that operate over...
Tiling is a well-known loop transformation to improve temporal locality of nested loops. Current com...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/18...
The delivered performance on modern processors that employ deep memory hierarchies is closely relate...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
Over the past 20 years, increases in processor speed have dramatically outstripped performance incre...
AbstractÐExploiting locality of references has become extremely important in realizing the potential...
An important class of scientific codes access memory in an irregular manner. Because irregular acce...
Global locality optimization is a technique for improving the cache performance of a sequence of loo...
This paper describes an algorithm to optimize cache locality in scientific codes on uniprocessor and...