We prove theorems that show that if we can reorder a program's memory refer-ence stream such that the reordered memory reference stream satises a disjointness property, then the transformed program corresponding to the reordered stream is guar-anteed to have fewer misses for any cache with arbitrary size or organization so long as the cache uses the LRU replacement policy. We can apply these results to reorder instructions within a basic block, to transform loops, to reorder blocks within a pro-cedure, or to reorder procedures within a program so as to improve hit rate for any cache that uses LRU replacement. Based on these theorems, we develop algorithmic methods for program transforma-tion to improve cache performance. While there ha...
A limit to computer system performance is the miss penalty for fetching data and instructions from l...
It has long been known that the quality of the code produced by an optimizing compiler is dependent ...
In the embedded domain, the gap between memory and processor performance and the increase in applica...
The instruction cache is a popular target for optimizations of microprocessor-based systems because ...
Commercial link : http://www.springerlink.de/ ALCHEMY/http://www.springer.comCache memories were inv...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
© 1994 ACM. In the past decade, processor speed has become significantly faster than memory speed. S...
We explore the use of compiler optimizations, which optimize the layout of instructions in memory. T...
Trace cache, an instruction fetch technique that reduces taken branch penalties by storing and fetch...
We address the problem of improving cache predictability and performance in embedded systems through...
Thesis (Ph. D.)--University of Washington, 1996Caches are used in almost every modem processor desig...
We address the problem of improving cache predictability (worst-case performance) and performance in...
In this lecture we consider loop transformations that can be used for cache optimization. The transf...
An ideal high performance computer includes a fast processor and a multi-million byte memory of comp...
A limit to computer system performance is the miss penalty for fetching data and instructions from l...
It has long been known that the quality of the code produced by an optimizing compiler is dependent ...
In the embedded domain, the gap between memory and processor performance and the increase in applica...
The instruction cache is a popular target for optimizations of microprocessor-based systems because ...
Commercial link : http://www.springerlink.de/ ALCHEMY/http://www.springer.comCache memories were inv...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
© 1994 ACM. In the past decade, processor speed has become significantly faster than memory speed. S...
We explore the use of compiler optimizations, which optimize the layout of instructions in memory. T...
Trace cache, an instruction fetch technique that reduces taken branch penalties by storing and fetch...
We address the problem of improving cache predictability and performance in embedded systems through...
Thesis (Ph. D.)--University of Washington, 1996Caches are used in almost every modem processor desig...
We address the problem of improving cache predictability (worst-case performance) and performance in...
In this lecture we consider loop transformations that can be used for cache optimization. The transf...
An ideal high performance computer includes a fast processor and a multi-million byte memory of comp...
A limit to computer system performance is the miss penalty for fetching data and instructions from l...
It has long been known that the quality of the code produced by an optimizing compiler is dependent ...
In the embedded domain, the gap between memory and processor performance and the increase in applica...