Abstract. Many current computer designs employ caches and a hierarchical memory architec-ture. The speed of a code depends on how well the cache structure is exploited. The number of cache misses provides a better measure for comparing algorithms than the number of multiplies. In this paper, suitable blocking strategies for both structured and unstructured grids will be introduced. They improve the cache usage without changing the underlying algorithm. In particular, bitwise compatibility is guaranteed between the standard and the high performance implementations of the algorithms. This is illustrated by comparisons for various multigrid algorithms on a selection of dierent computers for problems in two and three dimensions. The code restru...
Embedded systems are getting popular in today’s world. They are usually small and thus have a limite...
This dissertation presents a multilevel algorithm to solve constant and variable coeffcient elliptic...
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD...
. Many current computer designs employ caches and a hierarchical memory architecture. The speed of a...
. A Gauss-Seidel variant is developed which maintains data in the L2 cache memory longer than and ru...
Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchie...
In order to mitigate the impact of the constantly widening gap between processor speed and main memo...
Stencil-based kernels constitute the core of many scientific applications on block-structured grids....
Obtaining high performance without machine-specific tuning is an important goal of scientific applic...
In previous work, a cache-aware sparse matrix multiplication for linear programming interior point m...
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD...
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD...
The scalable implementation of multigrid methods for machines with several thousands of processors i...
For many numerical codes the transport of data from main memory to the registers is com-monly consid...
In order to mitigate the impact of the growing gap between CPU speed and main memory performance, to...
Embedded systems are getting popular in today’s world. They are usually small and thus have a limite...
This dissertation presents a multilevel algorithm to solve constant and variable coeffcient elliptic...
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD...
. Many current computer designs employ caches and a hierarchical memory architecture. The speed of a...
. A Gauss-Seidel variant is developed which maintains data in the L2 cache memory longer than and ru...
Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchie...
In order to mitigate the impact of the constantly widening gap between processor speed and main memo...
Stencil-based kernels constitute the core of many scientific applications on block-structured grids....
Obtaining high performance without machine-specific tuning is an important goal of scientific applic...
In previous work, a cache-aware sparse matrix multiplication for linear programming interior point m...
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD...
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD...
The scalable implementation of multigrid methods for machines with several thousands of processors i...
For many numerical codes the transport of data from main memory to the registers is com-monly consid...
In order to mitigate the impact of the growing gap between CPU speed and main memory performance, to...
Embedded systems are getting popular in today’s world. They are usually small and thus have a limite...
This dissertation presents a multilevel algorithm to solve constant and variable coeffcient elliptic...
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD...