Abstract It has been observed that memory access performance can be improved by restructuring data declarations, using simple transformations such as array dimension padding and inter-array padding (array alignment) to reduce the number of misses in the cache and TLB (translation lookaside buffer). These transformations can be applied to both static and dynamic array variables. In this paper, we provide a padding algorithm for selecting appropriate padding amounts, which takes into account various cache and TLB effects collectively within a single framework. In addition to reducing the number of misses, we identify the importance of reducing the impact of cache miss jamming by spreading cache misses more uniformly across loop iterations. We...
Thesis (Ph. D.)--University of Washington, 1996Caches are used in almost every modem processor desig...
This paper introduces a dynamic layout optimization strategy to minimize the number of cycles spent ...
This dissertation presents a systematic approach to reduction of cache coherence overhead in shared-...
It has been observed that memory access performance can be improved by restructuring data declaratio...
Limited set-associativity in hardware caches can cause conflict misses when multiple data items map ...
This thesis investigates compiler algorithms to transform program and data to utilize efficiently th...
Limited set-associativity in hardware caches can cause conflict misses when multiple data items map ...
. We address the problem of improving the data cache performance of numerical applications -- specif...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
In the past decade, processor speed has become signicantly faster than memory speed. Small, fast cac...
Abstract—Exploiting locality of reference is key to realizing high levels of performance on modern p...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
Tiling is a well-known loop transformation to improve temporal locality of nested loops. Current com...
This paper proposes an optimization by an alternative approach to memory mapping. Caches with low se...
Thesis (Ph. D.)--University of Washington, 1996Caches are used in almost every modem processor desig...
Thesis (Ph. D.)--University of Washington, 1996Caches are used in almost every modem processor desig...
This paper introduces a dynamic layout optimization strategy to minimize the number of cycles spent ...
This dissertation presents a systematic approach to reduction of cache coherence overhead in shared-...
It has been observed that memory access performance can be improved by restructuring data declaratio...
Limited set-associativity in hardware caches can cause conflict misses when multiple data items map ...
This thesis investigates compiler algorithms to transform program and data to utilize efficiently th...
Limited set-associativity in hardware caches can cause conflict misses when multiple data items map ...
. We address the problem of improving the data cache performance of numerical applications -- specif...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
In the past decade, processor speed has become signicantly faster than memory speed. Small, fast cac...
Abstract—Exploiting locality of reference is key to realizing high levels of performance on modern p...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
Tiling is a well-known loop transformation to improve temporal locality of nested loops. Current com...
This paper proposes an optimization by an alternative approach to memory mapping. Caches with low se...
Thesis (Ph. D.)--University of Washington, 1996Caches are used in almost every modem processor desig...
Thesis (Ph. D.)--University of Washington, 1996Caches are used in almost every modem processor desig...
This paper introduces a dynamic layout optimization strategy to minimize the number of cycles spent ...
This dissertation presents a systematic approach to reduction of cache coherence overhead in shared-...