We introduce a code generator that converts unoptimized C++ code operating on sparse data into vectorized and parallel CPU or GPU kernels. Our approach unrolls the computation into a massive expression graph, performs redundant expression elimination, grouping, and then generates an architecture-specific kernel to solve the same problem, assuming that the sparsity pattern is fixed, which is a common scenario in many applications in computer graphics and scientific computing. We show that our approach scales to large problems and can achieve speedups of two orders of magnitude on CPUs and three orders of magnitude on GPUs, compared to a set of manually optimized CPU baselines. To demonstrate the practical applicability of our approach, we em...
C++ has gained broad acceptance as an object-oriented evolutionary extension to the C language, but ...
AbstractThe sparse matrix-vector multiplication (SpMV) is a fundamental kernel used in computational...
Expression Templates is a technique allowing to write linear algebra code in C++ the same way it wou...
The combined exploitation of stream and data parallelism is demonstrating encouraging performance re...
Abstract. Sparse matrix-vector multiplication is an important computational kernel that tends to per...
We describe an object oriented sparse matrix library in C++ designed for portability and performance...
It is well acknowledged that the dominant mechanism for scaling processor performance has become to ...
We present an automated code engine (ACE) that automatically generates optimized kernels for computi...
Abstract—Krylov subspace solvers are often the method of choice when solving sparse linear systems i...
Parallelism in today's computer architectures is ubiquitous whether it be in supercomputers, worksta...
Stencil computations are a class of algorithms operating on multi-dimensional arrays, which update a...
Sparse matrix representations are ubiquitous in computational science and machine learning, leading ...
We present an automated code engine (ACE) that automatically generates optimized kernels for computi...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
The goal of tasks 5.1 and 5.2 was to extend the code generation pipeline of lbmpy to support a full ...
C++ has gained broad acceptance as an object-oriented evolutionary extension to the C language, but ...
AbstractThe sparse matrix-vector multiplication (SpMV) is a fundamental kernel used in computational...
Expression Templates is a technique allowing to write linear algebra code in C++ the same way it wou...
The combined exploitation of stream and data parallelism is demonstrating encouraging performance re...
Abstract. Sparse matrix-vector multiplication is an important computational kernel that tends to per...
We describe an object oriented sparse matrix library in C++ designed for portability and performance...
It is well acknowledged that the dominant mechanism for scaling processor performance has become to ...
We present an automated code engine (ACE) that automatically generates optimized kernels for computi...
Abstract—Krylov subspace solvers are often the method of choice when solving sparse linear systems i...
Parallelism in today's computer architectures is ubiquitous whether it be in supercomputers, worksta...
Stencil computations are a class of algorithms operating on multi-dimensional arrays, which update a...
Sparse matrix representations are ubiquitous in computational science and machine learning, leading ...
We present an automated code engine (ACE) that automatically generates optimized kernels for computi...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
The goal of tasks 5.1 and 5.2 was to extend the code generation pipeline of lbmpy to support a full ...
C++ has gained broad acceptance as an object-oriented evolutionary extension to the C language, but ...
AbstractThe sparse matrix-vector multiplication (SpMV) is a fundamental kernel used in computational...
Expression Templates is a technique allowing to write linear algebra code in C++ the same way it wou...