This paper describes an approach for the automatic generation and optimization of numerical software for processors with deep memory hierarchies and pipelined functional units. The production of such software for machines ranging from desktop workstations to embedded processors can be a tedious and time consuming process. The work described here can help in automating much of this process. We will concentrate our efforts on the widely used linear algebra kernels called the Basic Linear Algebra Subroutines (BLAS). In particular, the work presented here is for general matrix multiply, DGEMM. However much of the technology and approach developed here can be applied to the other Level 3 BLAS and the general strategy can have an impact o...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
This paper summarizes the BLAS Technical Forum Standard, a speci- #cation of a set of kernel routine...
Abstract—Scientific programmers often turn to vendor-tuned Basic Linear Algebra Subprograms (BLAS) t...
This paper describes an approach for the automatic generation and optimization of numerical software...
One of the main obstacles to the efficient solution of scientific problems is the problem of tuning ...
This paper discusses the design of linear algebra libraries for high performance computers. Particul...
AbstractThe increasing availability of advanced-architecture computers has a significant effect on a...
AbstractThe introduction of auto-tuning techniques in linear algebra routines using hybrid combinati...
It is rare for a programmer to solve a numerical problem with a single library call; most problems r...
Design by Transformation (DxT) is an approach to software development that encodes domain-specific p...
Les architectures parallèles sont aujourd'hui présentes dans tous les systèmes informatiques, allant...
AbstractDesign by Transformation (DxT) is an approach to software development that encodes domain-sp...
This dissertation focuses on the design and the implementation of domain-specific compilers for line...
A technique for optimizing software is proposed that involves the use of a standardized set of compu...
The goal of the LAPACK project is to provide efficient and portable software for dense numerical lin...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
This paper summarizes the BLAS Technical Forum Standard, a speci- #cation of a set of kernel routine...
Abstract—Scientific programmers often turn to vendor-tuned Basic Linear Algebra Subprograms (BLAS) t...
This paper describes an approach for the automatic generation and optimization of numerical software...
One of the main obstacles to the efficient solution of scientific problems is the problem of tuning ...
This paper discusses the design of linear algebra libraries for high performance computers. Particul...
AbstractThe increasing availability of advanced-architecture computers has a significant effect on a...
AbstractThe introduction of auto-tuning techniques in linear algebra routines using hybrid combinati...
It is rare for a programmer to solve a numerical problem with a single library call; most problems r...
Design by Transformation (DxT) is an approach to software development that encodes domain-specific p...
Les architectures parallèles sont aujourd'hui présentes dans tous les systèmes informatiques, allant...
AbstractDesign by Transformation (DxT) is an approach to software development that encodes domain-sp...
This dissertation focuses on the design and the implementation of domain-specific compilers for line...
A technique for optimizing software is proposed that involves the use of a standardized set of compu...
The goal of the LAPACK project is to provide efficient and portable software for dense numerical lin...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
This paper summarizes the BLAS Technical Forum Standard, a speci- #cation of a set of kernel routine...
Abstract—Scientific programmers often turn to vendor-tuned Basic Linear Algebra Subprograms (BLAS) t...