For good performance of every computer program, good cache and TLB utilization is crucial. In numerical linear algebra libraries (such as BLAS or LAPACK), good cache utilization is achieved by explicit loop restructuring (mainly loop blocking), but this requires difficult memory pattern behavior analysis. In this paper, we represent the recursive implementation (“divide and conquer” approach) of some routines from numerical algebra libraries. This implementation leads to good cache and TLB utilization with no need to analyze the memory pattern behavior due to “natural” partition of data.
We present a simple and novel framework for generating blocked codes for high-performance machines w...
This paper discusses the design of linear algebra libraries for high performance computers. Particul...
We describe a model that enables us to analyze the running time of an algorithm in a computer with a...
For good performance of every computer program, good cache and TLB utilization is crucial. In numeri...
Block-recursive codes for dense numerical linear algebra computations appear to be well-suited for ...
Abstract. For good performance of every computer program, good cache utiliza-tion is crucial. In num...
The trend in high-performance microprocessor design is toward increasing computational power on the ...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
Abstract—LAPACK (Linear Algebra PACKage) is a statically cache-blocked library, where the blocking f...
Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchie...
This report has been developed over the work done in the deliverable [Nava94] There it was shown tha...
This dissertation details contributions made by the author to the field of computer science while wo...
Abstract It has been observed that memory access performance can be improved by restructuring data d...
In this paper we present a method for determining the cache performance of the loop nests in a progr...
In order to mitigate the impact of the constantly widening gap between processor speed and main memo...
We present a simple and novel framework for generating blocked codes for high-performance machines w...
This paper discusses the design of linear algebra libraries for high performance computers. Particul...
We describe a model that enables us to analyze the running time of an algorithm in a computer with a...
For good performance of every computer program, good cache and TLB utilization is crucial. In numeri...
Block-recursive codes for dense numerical linear algebra computations appear to be well-suited for ...
Abstract. For good performance of every computer program, good cache utiliza-tion is crucial. In num...
The trend in high-performance microprocessor design is toward increasing computational power on the ...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
Abstract—LAPACK (Linear Algebra PACKage) is a statically cache-blocked library, where the blocking f...
Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchie...
This report has been developed over the work done in the deliverable [Nava94] There it was shown tha...
This dissertation details contributions made by the author to the field of computer science while wo...
Abstract It has been observed that memory access performance can be improved by restructuring data d...
In this paper we present a method for determining the cache performance of the loop nests in a progr...
In order to mitigate the impact of the constantly widening gap between processor speed and main memo...
We present a simple and novel framework for generating blocked codes for high-performance machines w...
This paper discusses the design of linear algebra libraries for high performance computers. Particul...
We describe a model that enables us to analyze the running time of an algorithm in a computer with a...