The final publication is available at Springer via http://dx.doi.org/10.1007/s10766-013-0249-6The introduction of auto-tuning techniques in linear algebra shared-memory routines is analyzed. Information obtained in the installation of the routines is used at running time to take some decisions to reduce the total execution time. The study is carried out with routines at different levels (matrix multiplication, LU and Cholesky factorizations and linear systems symmetric or general routines) and with calls to routines in the LAPACK and PLASMA libraries with multithread implementations. Medium NUMA and large cc-NUMA systems are used in the experiments. This variety of routines, libraries and systems allows us to obtain general conclusions abou...
. In this paper we study the design of installation routines for linear algebra routines on network...
© 2019 Elsevier. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://...
Abstract. In this article we look at the generation of libraries for dense linear algebra operations...
AbstractIn this work the behavior of the multithreaded implementation of some LAPACK routines on PLA...
AbstractThe introduction of auto-tuning techniques in linear algebra routines using hybrid combinati...
AbstractThe use of an OpenMP compiler optimized for the corresponding multicore system is a good opt...
It is rare for a programmer to solve a numerical problem with a single library call; most problems r...
This dissertation details contributions made by the author to the field of computer science while wo...
This paper describes an approach for the automatic generation and optimization of numerical softwar...
This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distribu...
This paper presents a self-optimization methodology for parallel linear algebra rou-tines on heterog...
This paper discusses the design of linear algebra libraries for high performance computers. Particul...
Abstract In this document we present a new approach to developing sequential and parallel dense line...
This report summarizes the progress made as part of a one year lab-directed research and development...
With the emergence of thread-level parallelism as the primary means for continued improvement of per...
. In this paper we study the design of installation routines for linear algebra routines on network...
© 2019 Elsevier. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://...
Abstract. In this article we look at the generation of libraries for dense linear algebra operations...
AbstractIn this work the behavior of the multithreaded implementation of some LAPACK routines on PLA...
AbstractThe introduction of auto-tuning techniques in linear algebra routines using hybrid combinati...
AbstractThe use of an OpenMP compiler optimized for the corresponding multicore system is a good opt...
It is rare for a programmer to solve a numerical problem with a single library call; most problems r...
This dissertation details contributions made by the author to the field of computer science while wo...
This paper describes an approach for the automatic generation and optimization of numerical softwar...
This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distribu...
This paper presents a self-optimization methodology for parallel linear algebra rou-tines on heterog...
This paper discusses the design of linear algebra libraries for high performance computers. Particul...
Abstract In this document we present a new approach to developing sequential and parallel dense line...
This report summarizes the progress made as part of a one year lab-directed research and development...
With the emergence of thread-level parallelism as the primary means for continued improvement of per...
. In this paper we study the design of installation routines for linear algebra routines on network...
© 2019 Elsevier. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://...
Abstract. In this article we look at the generation of libraries for dense linear algebra operations...