Block-recursive codes for dense numerical linear algebra computations appear to be well-suited for execution on machines with deep memory hierarchies because they are effectively blocked for all levels of the hierarchy. In this paper, we describe compiler technology to translate iterativeversions of a number of numerical kernels into block-recursive form. We also study the cache behavior and performance of these compiler generated block-recursivecodes
Modern compilers offer more and more capabilities to automatically parallelize code-regions if these...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
Les architectures parallèles sont aujourd'hui présentes dans tous les systèmes informatiques, allant...
For good performance of every computer program, good cache and TLB utilization is crucial. In numeri...
We present a simple and novel framework for generating blocked codes for high-performance machines w...
The goal of the LAPACK project is to provide efficient and portable software for dense numerical lin...
Abstract. The use of highly optimized inner kernels is of paramount im-portance for obtaining effici...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/18...
This dissertation focuses on the design and the implementation of domain-specific compilers for line...
Abstract In this document we present a new approach to developing sequential and parallel dense line...
Over the past 20 years, increases in processor speed have dramatically outstripped performance incre...
This thesis describes novel techniques and test implementations for optimizing numerically intensive...
This report has been developed over the work done in the deliverable [Nava94] There it was shown tha...
The trend in high-performance microprocessor design is toward increasing computational power on the ...
Abstract. We present a prototypical linear algebra compiler that automatically exploits domain-speci...
Modern compilers offer more and more capabilities to automatically parallelize code-regions if these...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
Les architectures parallèles sont aujourd'hui présentes dans tous les systèmes informatiques, allant...
For good performance of every computer program, good cache and TLB utilization is crucial. In numeri...
We present a simple and novel framework for generating blocked codes for high-performance machines w...
The goal of the LAPACK project is to provide efficient and portable software for dense numerical lin...
Abstract. The use of highly optimized inner kernels is of paramount im-portance for obtaining effici...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/18...
This dissertation focuses on the design and the implementation of domain-specific compilers for line...
Abstract In this document we present a new approach to developing sequential and parallel dense line...
Over the past 20 years, increases in processor speed have dramatically outstripped performance incre...
This thesis describes novel techniques and test implementations for optimizing numerically intensive...
This report has been developed over the work done in the deliverable [Nava94] There it was shown tha...
The trend in high-performance microprocessor design is toward increasing computational power on the ...
Abstract. We present a prototypical linear algebra compiler that automatically exploits domain-speci...
Modern compilers offer more and more capabilities to automatically parallelize code-regions if these...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
Les architectures parallèles sont aujourd'hui présentes dans tous les systèmes informatiques, allant...