. Many current computer designs employ caches and a hierarchical memory architecture. The speed of a code depends on how well the cache structure is exploited. The number of cache misses provides a better measure for comparing algorithms than the number of multiplies. In this paper, suitable blocking strategies for both structured and unstructured grids will be introduced. They improve the cache usage without changing the underlying algorithm. In particular, bitwise compatibility is guaranteed between the standard and the high performance implementations of the algorithms. This is illustrated by comparisons for various multigrid algorithms on a selection of different computers for problems in two and three dimensions. The code restructuring...
In previous work, a cache-aware sparse matrix multiplication for linear programming interior point m...
On modern hardware architectures, the performance of Flux Reconstruction (FR) methods can be limited...
PosterWhy is it important? As number of cores in a processor scale up, caches would become banked ...
Abstract. Many current computer designs employ caches and a hierarchical memory architec-ture. The s...
Many current computer designs employ caches and a hierarchical memory architecture. The speed of a c...
. A Gauss-Seidel variant is developed which maintains data in the L2 cache memory longer than and ru...
Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchie...
Obtaining high performance without machine-specific tuning is an important goal of scientific applic...
Stencil-based kernels constitute the core of many scientific applications on block-structured grids....
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD...
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD...
In order to mitigate the impact of the constantly widening gap between processor speed and main memo...
In order to mitigate the impact of the growing gap between CPU speed and main memory performance, to...
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD...
Embedded systems are getting popular in today’s world. They are usually small and thus have a limite...
In previous work, a cache-aware sparse matrix multiplication for linear programming interior point m...
On modern hardware architectures, the performance of Flux Reconstruction (FR) methods can be limited...
PosterWhy is it important? As number of cores in a processor scale up, caches would become banked ...
Abstract. Many current computer designs employ caches and a hierarchical memory architec-ture. The s...
Many current computer designs employ caches and a hierarchical memory architecture. The speed of a c...
. A Gauss-Seidel variant is developed which maintains data in the L2 cache memory longer than and ru...
Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchie...
Obtaining high performance without machine-specific tuning is an important goal of scientific applic...
Stencil-based kernels constitute the core of many scientific applications on block-structured grids....
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD...
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD...
In order to mitigate the impact of the constantly widening gap between processor speed and main memo...
In order to mitigate the impact of the growing gap between CPU speed and main memory performance, to...
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD...
Embedded systems are getting popular in today’s world. They are usually small and thus have a limite...
In previous work, a cache-aware sparse matrix multiplication for linear programming interior point m...
On modern hardware architectures, the performance of Flux Reconstruction (FR) methods can be limited...
PosterWhy is it important? As number of cores in a processor scale up, caches would become banked ...