Improving program locality has become increasingly important on modern computer systems. An effective strategy is to group computations on the same data so that once the data are loaded into cache, the program performs all their operations before the data are evicted. However, computation regrouping is difficult to automate for programs with complex data and control structures. This paper studies the potential of locality improvement through trace-driven computation regrouping. First, it shows that maximizing the locality is different from maximizing the parallelism or maximizing the cache utilization. The problem is NP-hard even without considering data dependences and cache organization. Then the paper describes a tool that performs const...
grantor: University of TorontoThis dissertation proposes and evaluates compiler techniques...
As computing efficiency becomes constrained by hardware scaling limitations, code optimization grows...
On modern computers, the performance of programs is often limited by memory latency rather than by ...
Commercial link : http://www.springerlink.de/ ALCHEMY/http://www.springer.comCache memories were inv...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
In the past decade, processor speed has become signicantly faster than memory speed. Small, fast cac...
Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak process...
© 1994 ACM. In the past decade, processor speed has become significantly faster than memory speed. S...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 2014.As multi-core processors b...
Since the introduction of cache memories in computer architecture, techniques to improve the data lo...
Due to the huge speed gaps in the memory hierarchy of modern computer architectures, it is important...
International audienceEmerging computer architectures will feature drastically decreased flops/byte ...
In memory hierarchies, programs can be speeded up by increasing their degree of locality. This paper...
Abstract—Exploiting locality of reference is key to realizing high levels of performance on modern p...
The exploitation of locality of reference in shared memory multiprocessors is one of the most import...
grantor: University of TorontoThis dissertation proposes and evaluates compiler techniques...
As computing efficiency becomes constrained by hardware scaling limitations, code optimization grows...
On modern computers, the performance of programs is often limited by memory latency rather than by ...
Commercial link : http://www.springerlink.de/ ALCHEMY/http://www.springer.comCache memories were inv...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
In the past decade, processor speed has become signicantly faster than memory speed. Small, fast cac...
Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak process...
© 1994 ACM. In the past decade, processor speed has become significantly faster than memory speed. S...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 2014.As multi-core processors b...
Since the introduction of cache memories in computer architecture, techniques to improve the data lo...
Due to the huge speed gaps in the memory hierarchy of modern computer architectures, it is important...
International audienceEmerging computer architectures will feature drastically decreased flops/byte ...
In memory hierarchies, programs can be speeded up by increasing their degree of locality. This paper...
Abstract—Exploiting locality of reference is key to realizing high levels of performance on modern p...
The exploitation of locality of reference in shared memory multiprocessors is one of the most import...
grantor: University of TorontoThis dissertation proposes and evaluates compiler techniques...
As computing efficiency becomes constrained by hardware scaling limitations, code optimization grows...
On modern computers, the performance of programs is often limited by memory latency rather than by ...