International audienceAdapting a source code to the specificity of its host hardware represents one way to implement software optimization. This allows to benefit from processors that are primarily designed to improve system performance. To reach such a software/hard-ware fitting without narrowing the scope of the optimization to few executions, one needs to have at his disposal relevant performance models of the considered hardware. This paper proposes a new method to optimize software kernels by considering their data-access mode. The proposed method permits to build a data-cache-miss model of a given application regarding its specific memory-access pattern. We apply our method in order to evaluate some custom implementations of matrix da...
Data locality is central to modern computer designs. The widening gap between processor speed and me...
Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchie...
Caches were designed to amortize the cost of memory accesses by moving copies of frequently accessed...
International audienceAdapting a source code to the specificity of its host hardware represents one ...
International audienceAdapting a source code to the specificity of its host hardware represents one ...
With the software applications increasing in complexity, description of hardware is becoming increas...
Obtaining high performance without machine-specific tuning is an important goal of scientific applic...
When applying optimizations, a number of decisions are made using fixed strategies, such as always a...
The advent of data proliferation and electronic devices gets low execution time and energy consumpti...
ABSTRACT Goal-Directed Performance Tuning for Scientific Applications by Tien-Pao Shih Chair: Edward...
For many applications, cache misses are the primary performance bottleneck. Even though much researc...
Performance tuning, as carried out by compiler designers and application programmers to close the pe...
The growing gap between processor clock speed and DRAM access time puts new demands on software and ...
Improving cache performance requires understanding cache behavior. However, measuring cache performa...
. Many scientific applications handle compressed sparse matrices. Cache behavior during the executio...
Data locality is central to modern computer designs. The widening gap between processor speed and me...
Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchie...
Caches were designed to amortize the cost of memory accesses by moving copies of frequently accessed...
International audienceAdapting a source code to the specificity of its host hardware represents one ...
International audienceAdapting a source code to the specificity of its host hardware represents one ...
With the software applications increasing in complexity, description of hardware is becoming increas...
Obtaining high performance without machine-specific tuning is an important goal of scientific applic...
When applying optimizations, a number of decisions are made using fixed strategies, such as always a...
The advent of data proliferation and electronic devices gets low execution time and energy consumpti...
ABSTRACT Goal-Directed Performance Tuning for Scientific Applications by Tien-Pao Shih Chair: Edward...
For many applications, cache misses are the primary performance bottleneck. Even though much researc...
Performance tuning, as carried out by compiler designers and application programmers to close the pe...
The growing gap between processor clock speed and DRAM access time puts new demands on software and ...
Improving cache performance requires understanding cache behavior. However, measuring cache performa...
. Many scientific applications handle compressed sparse matrices. Cache behavior during the executio...
Data locality is central to modern computer designs. The widening gap between processor speed and me...
Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchie...
Caches were designed to amortize the cost of memory accesses by moving copies of frequently accessed...