In this paper we present a method for determining the cache performance of the loop nests in a program. The cache-miss data are produced by simulating the loop nest execution on an architecturally parameterized cache simulator. We show that the cache-miss rates are highly non-linear with respect to the ranges of the loops, and correlate well with the performance of the loop nests on actual target machines. The cache-miss ratio is used to guide program optimizations such as loop interchange and iteration-space blocking. It can also be used to provide an estimate for the runtime of a program. Both applications are important in scheduling programs for parallel execution. Presented here are examples of program optimization for several popular p...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
The use of caches poses a difficult tradeoff for architects of real-time systems. While caches provi...
. This paper studies the locality analysis problem for sharedmemory multiprocessors, a class of para...
We present a novel, compile-time method for determining the cache performance of the loop nests in a...
Cache behavior is complex and inherently unstable, yet it is a critical factor affecting program per...
The time a program takes to execute is significantly affected by the efficiency with which it utilis...
We develop from first principles an exact model of the behavior of loop nests executing in a memory ...
Obtaining high performance without machine-specific tuning is an important goal of scientific applic...
The technological improvements in silicon manufacturing are yielding vast increases of processor &ap...
Measurements of actual supercomputer cache performance has not been previously undertaken. PFC-Sim i...
The trend in high-performance microprocessor design is toward increasing computational power on the ...
Abstract—Although modeling of memory caches for the purpose of cache design and process scheduling h...
Commercial link : http://www.springerlink.de/ ALCHEMY/http://www.springer.comCache memories were inv...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
this paper we will present a solution to the problem of determining loop and data partitions automat...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
The use of caches poses a difficult tradeoff for architects of real-time systems. While caches provi...
. This paper studies the locality analysis problem for sharedmemory multiprocessors, a class of para...
We present a novel, compile-time method for determining the cache performance of the loop nests in a...
Cache behavior is complex and inherently unstable, yet it is a critical factor affecting program per...
The time a program takes to execute is significantly affected by the efficiency with which it utilis...
We develop from first principles an exact model of the behavior of loop nests executing in a memory ...
Obtaining high performance without machine-specific tuning is an important goal of scientific applic...
The technological improvements in silicon manufacturing are yielding vast increases of processor &ap...
Measurements of actual supercomputer cache performance has not been previously undertaken. PFC-Sim i...
The trend in high-performance microprocessor design is toward increasing computational power on the ...
Abstract—Although modeling of memory caches for the purpose of cache design and process scheduling h...
Commercial link : http://www.springerlink.de/ ALCHEMY/http://www.springer.comCache memories were inv...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
this paper we will present a solution to the problem of determining loop and data partitions automat...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
The use of caches poses a difficult tradeoff for architects of real-time systems. While caches provi...
. This paper studies the locality analysis problem for sharedmemory multiprocessors, a class of para...