. A Gauss-Seidel variant is developed which maintains data in the L2 cache memory longer than and runs approximately twice as fast as standard implementations. This variant depends on a decomposition of grid nodes into blocks which fit into cache. We discuss two O(n) algorithms which perform a one-time reordering of the grid nodes and associated operators. Numerical tests demonstrate the speedups possible. A performance analysis tool confirms that our version makes significantly better use of L2 cache than standard versions. keywords: cache, high performance computing, multigrid 1. Introduction. High speed cache memory is commonly used to address the disparity between the speed of a computer's central processing unit and the computer&...
With rapidly evolving technology, multicore and manycore processors have emerged as promising archit...
Two acceleration techniques, based on additive corrections are evaluated with a multithreaded 2D Poi...
This dissertation presents a multilevel algorithm to solve constant and variable coeffcient elliptic...
Abstract. Many current computer designs employ caches and a hierarchical memory architec-ture. The s...
. Many current computer designs employ caches and a hierarchical memory architecture. The speed of a...
Efficient solution of partial differential equations require a match between the algorithm and the t...
Efficient solution of partial differential equations require a match between the algorithm and the t...
Stencil-based kernels constitute the core of many scientific applications on block-structured grids....
Abstract: Making multigrid algorithms run efficiently on large parallel computers is a challenge. Wi...
© 2017 IEEE. Large-scale applications implemented in today's high performance graph frameworks heavi...
Finite Element problems are often solved using multigrid techniques. The most time consuming part of...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
In previous work, a cache-aware sparse matrix multiplication for linear programming interior point m...
Finding minimal cuts on graphs with a grid-like struc-ture has become a core task for solving many c...
AbstractTwo acceleration techniques, based on additive corrections are evaluated with a multithreade...
With rapidly evolving technology, multicore and manycore processors have emerged as promising archit...
Two acceleration techniques, based on additive corrections are evaluated with a multithreaded 2D Poi...
This dissertation presents a multilevel algorithm to solve constant and variable coeffcient elliptic...
Abstract. Many current computer designs employ caches and a hierarchical memory architec-ture. The s...
. Many current computer designs employ caches and a hierarchical memory architecture. The speed of a...
Efficient solution of partial differential equations require a match between the algorithm and the t...
Efficient solution of partial differential equations require a match between the algorithm and the t...
Stencil-based kernels constitute the core of many scientific applications on block-structured grids....
Abstract: Making multigrid algorithms run efficiently on large parallel computers is a challenge. Wi...
© 2017 IEEE. Large-scale applications implemented in today's high performance graph frameworks heavi...
Finite Element problems are often solved using multigrid techniques. The most time consuming part of...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
In previous work, a cache-aware sparse matrix multiplication for linear programming interior point m...
Finding minimal cuts on graphs with a grid-like struc-ture has become a core task for solving many c...
AbstractTwo acceleration techniques, based on additive corrections are evaluated with a multithreade...
With rapidly evolving technology, multicore and manycore processors have emerged as promising archit...
Two acceleration techniques, based on additive corrections are evaluated with a multithreaded 2D Poi...
This dissertation presents a multilevel algorithm to solve constant and variable coeffcient elliptic...