For good performance of every computer program, good cache and TLB utilization is crucial. In numerical linear algebra libraries (such as BLAS or LAPACK), good cache utilization is achieved by explicit loop restructuring (mainly loop blocking), but this requires difficult memory pattern behavior analysis. In this paper, we represent the recursive implementation (“divide and conquer” approach) of some routines from numerical algebra libraries. This implementation leads to good cache and TLB utilization with no need to analyze the memory pattern behavior due to “natural” partition of data.
textOver the last two decades, much progress has been made in the area of the high-performance sequ...
This paper describes the design of ScaLAPACK, a scalable software library for performing dense and b...
The promise of future many-core processors, with hundreds of threads running concurrently, has led t...
For good performance of every computer program, good cache and TLB utilization is crucial. In numeri...
Block-recursive codes for dense numerical linear algebra computations appear to be well-suited for ...
This paper discusses the design of linear algebra libraries for high performance computers. Particul...
Abstract. For good performance of every computer program, good cache utiliza-tion is crucial. In num...
Software overheads can be a significant cause of performance degradation in parallel numerical libra...
Software overheads can be a significant cause of performance degradation in parallel numerical libra...
Abstract—LAPACK (Linear Algebra PACKage) is a statically cache-blocked library, where the blocking f...
This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distribu...
Abstract In this document we present a new approach to developing sequential and parallel dense line...
This dissertation details contributions made by the author to the field of computer science while wo...
AbstractIn this work the behavior of the multithreaded implementation of some LAPACK routines on PLA...
This report has been developed over the work done in the deliverable [Nava94] There it was shown tha...
textOver the last two decades, much progress has been made in the area of the high-performance sequ...
This paper describes the design of ScaLAPACK, a scalable software library for performing dense and b...
The promise of future many-core processors, with hundreds of threads running concurrently, has led t...
For good performance of every computer program, good cache and TLB utilization is crucial. In numeri...
Block-recursive codes for dense numerical linear algebra computations appear to be well-suited for ...
This paper discusses the design of linear algebra libraries for high performance computers. Particul...
Abstract. For good performance of every computer program, good cache utiliza-tion is crucial. In num...
Software overheads can be a significant cause of performance degradation in parallel numerical libra...
Software overheads can be a significant cause of performance degradation in parallel numerical libra...
Abstract—LAPACK (Linear Algebra PACKage) is a statically cache-blocked library, where the blocking f...
This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distribu...
Abstract In this document we present a new approach to developing sequential and parallel dense line...
This dissertation details contributions made by the author to the field of computer science while wo...
AbstractIn this work the behavior of the multithreaded implementation of some LAPACK routines on PLA...
This report has been developed over the work done in the deliverable [Nava94] There it was shown tha...
textOver the last two decades, much progress has been made in the area of the high-performance sequ...
This paper describes the design of ScaLAPACK, a scalable software library for performing dense and b...
The promise of future many-core processors, with hundreds of threads running concurrently, has led t...