Abstract—Scientific programmers often turn to vendor-tuned Basic Linear Algebra Subprograms (BLAS) to obtain portable high performance. However, many numerical algorithms require several BLAS calls in sequence, and those successive calls result in suboptimal performance. The entire sequence needs to be opti-mized in concert. Instead of vendor-tuned BLAS, a programmer could start with source code in Fortran or C (e.g., based on the Netlib BLAS) and use a state-of-the-art optimizing compiler. However, our experiments show that optimizing compilers often attain only one-quarter the performance of hand-optimized code. In this paper we present a domain-specific compiler for matrix algebra, the Build to Order BLAS (BTO), that reliably achieves hi...
Algorithm optimisation can be accomplished by an exhaustive search over alternative algorithms for p...
We describe a subset of the level-1, level-2, and level-3 BLAS implemented for each node of the Conn...
The functions library, called Basic Linear Algebra Subprograms (BLAS-1), is considered the programmi...
The goal of the LAPACK project is to provide efficient and portable software for dense numerical lin...
This thesis describes novel techniques and test implementations for optimizing numerically intensive...
It is rare for a programmer to solve a numerical problem with a single library call; most problems r...
This paper describes an approach for the automatic generation and optimization of numerical softwar...
Achieving high-performance while reducing power consumption is the key question as tech-nology scali...
This dissertation focuses on the design and the implementation of domain-specific compilers for line...
With the emergence of thread-level parallelism as the primary means for continued improvement of per...
AbstractEfficient implementation of matrix algebra is important to the performance of many large and...
A technique for optimizing software is proposed that involves the use of a standardized set of compu...
This paper proposes an API for Batched Basic Linear Algebra Subprograms (Batched BLAS). We focus on...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
Abstract. We present a prototypical linear algebra compiler that automatically exploits domain-speci...
Algorithm optimisation can be accomplished by an exhaustive search over alternative algorithms for p...
We describe a subset of the level-1, level-2, and level-3 BLAS implemented for each node of the Conn...
The functions library, called Basic Linear Algebra Subprograms (BLAS-1), is considered the programmi...
The goal of the LAPACK project is to provide efficient and portable software for dense numerical lin...
This thesis describes novel techniques and test implementations for optimizing numerically intensive...
It is rare for a programmer to solve a numerical problem with a single library call; most problems r...
This paper describes an approach for the automatic generation and optimization of numerical softwar...
Achieving high-performance while reducing power consumption is the key question as tech-nology scali...
This dissertation focuses on the design and the implementation of domain-specific compilers for line...
With the emergence of thread-level parallelism as the primary means for continued improvement of per...
AbstractEfficient implementation of matrix algebra is important to the performance of many large and...
A technique for optimizing software is proposed that involves the use of a standardized set of compu...
This paper proposes an API for Batched Basic Linear Algebra Subprograms (Batched BLAS). We focus on...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
Abstract. We present a prototypical linear algebra compiler that automatically exploits domain-speci...
Algorithm optimisation can be accomplished by an exhaustive search over alternative algorithms for p...
We describe a subset of the level-1, level-2, and level-3 BLAS implemented for each node of the Conn...
The functions library, called Basic Linear Algebra Subprograms (BLAS-1), is considered the programmi...