AbstractEfficient implementation of matrix algebra is important to the performance of many large and complex physical models. Among important tuning techniques is loop fusion which can reduce the amount of data moved between memory and the processor. We have developed the Build to Order (BTO) compiler to automate loop fusion for matrix algebra kernels. In this paper, we present BTO’s analytic memory model which substantially reduces the number of loop fusion options considered by the compiler. We introduce an example that motivates the inclusion of registers in the model. We demonstrate how the model’s modular design facilitates the addition of register allocation to the model’s set of memory components, improving its accuracy
Abstract. Autotuning technology has emerged recently as a systematic process for evaluating alternat...
During the last half-decade, a number of research efforts have centered around developing software f...
Evaluating an expression in linear algebra using the known Basic Linear Algebra Subprograms libr...
AbstractEfficient implementation of matrix algebra is important to the performance of many large and...
On modern processors, data transfer exceeds floating-point operations as the predominant cost in man...
Abstract—Scientific programmers often turn to vendor-tuned Basic Linear Algebra Subprograms (BLAS) t...
The goal of the LAPACK project is to provide efficient and portable software for dense numerical lin...
Over the past 20 years, increases in processor speed have dramatically outstripped performance incre...
The memory bandwidth largely determines the performance of embedded systems. However, very often com...
Strassen's algorithm for matrix multiplication gains its lower arithmetic complexityatthe expe...
This thesis describes novel techniques and test implementations for optimizing numerically intensive...
Over the past decade, microprocessor design strategies have focused on increasing the computational ...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/18...
This paper describes an approach for the automatic generation and optimization of numerical softwar...
Almost every modern processor is designed with a memory hierarchy organized into several levels, eac...
Abstract. Autotuning technology has emerged recently as a systematic process for evaluating alternat...
During the last half-decade, a number of research efforts have centered around developing software f...
Evaluating an expression in linear algebra using the known Basic Linear Algebra Subprograms libr...
AbstractEfficient implementation of matrix algebra is important to the performance of many large and...
On modern processors, data transfer exceeds floating-point operations as the predominant cost in man...
Abstract—Scientific programmers often turn to vendor-tuned Basic Linear Algebra Subprograms (BLAS) t...
The goal of the LAPACK project is to provide efficient and portable software for dense numerical lin...
Over the past 20 years, increases in processor speed have dramatically outstripped performance incre...
The memory bandwidth largely determines the performance of embedded systems. However, very often com...
Strassen's algorithm for matrix multiplication gains its lower arithmetic complexityatthe expe...
This thesis describes novel techniques and test implementations for optimizing numerically intensive...
Over the past decade, microprocessor design strategies have focused on increasing the computational ...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/18...
This paper describes an approach for the automatic generation and optimization of numerical softwar...
Almost every modern processor is designed with a memory hierarchy organized into several levels, eac...
Abstract. Autotuning technology has emerged recently as a systematic process for evaluating alternat...
During the last half-decade, a number of research efforts have centered around developing software f...
Evaluating an expression in linear algebra using the known Basic Linear Algebra Subprograms libr...