AbstractThe introduction of auto-tuning techniques in linear algebra routines using hybrid combinations of multiple CPU and GPU computing resources is analyzed. Basic models of the execution time and information obtained during the installation of the routines are used to optimize the execution time with a balanced assignation of the work to the computing components in the system. The study is carried out with a basic kernel (matrix-matrix multiplication) and a higher level routine (LU factorization) using GPUs and the host multicore processor. Satisfactory results are obtained, with experimental execution times close to the lowest experimentally achievable
AbstractLU factorization is the most computationally intensive step in solving systems of linear equ...
Graphics hardware’s performance is advancing much faster than the performance of conventional microp...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
AbstractThe introduction of auto-tuning techniques in linear algebra routines using hybrid combinati...
The use of auto-tuning techniques in a matrix multiplication routine for hybrid CPU+GPU platforms is...
In this chapter, we present a hybridization methodology for the development of linear algebra softwa...
The final publication is available at Springer via http://dx.doi.org/10.1007/s10766-013-0249-6The in...
Dense linear algebra(DLA) is one of the most seven important kernels in high performance computing. ...
Abstract. If multicore is a disruptive technology, try to imagine hybrid multicore systems enhanced ...
Graphics hardware's performance is advancing much faster than the performance of conventional microp...
This paper describes an approach for the automatic generation and optimization of numerical softwar...
The development of high performance dense linear algebra (DLA) critically depends on highly optimize...
AbstractIn this work the behavior of the multithreaded implementation of some LAPACK routines on PLA...
If multicore is a disruptive technology, try to imagine hybrid multicore systems enhanced with accel...
Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on ...
AbstractLU factorization is the most computationally intensive step in solving systems of linear equ...
Graphics hardware’s performance is advancing much faster than the performance of conventional microp...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
AbstractThe introduction of auto-tuning techniques in linear algebra routines using hybrid combinati...
The use of auto-tuning techniques in a matrix multiplication routine for hybrid CPU+GPU platforms is...
In this chapter, we present a hybridization methodology for the development of linear algebra softwa...
The final publication is available at Springer via http://dx.doi.org/10.1007/s10766-013-0249-6The in...
Dense linear algebra(DLA) is one of the most seven important kernels in high performance computing. ...
Abstract. If multicore is a disruptive technology, try to imagine hybrid multicore systems enhanced ...
Graphics hardware's performance is advancing much faster than the performance of conventional microp...
This paper describes an approach for the automatic generation and optimization of numerical softwar...
The development of high performance dense linear algebra (DLA) critically depends on highly optimize...
AbstractIn this work the behavior of the multithreaded implementation of some LAPACK routines on PLA...
If multicore is a disruptive technology, try to imagine hybrid multicore systems enhanced with accel...
Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on ...
AbstractLU factorization is the most computationally intensive step in solving systems of linear equ...
Graphics hardware’s performance is advancing much faster than the performance of conventional microp...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...