AbstractThe introduction of auto-tuning techniques in linear algebra routines using hybrid combinations of multiple CPU and GPU computing resources is analyzed. Basic models of the execution time and information obtained during the installation of the routines are used to optimize the execution time with a balanced assignation of the work to the computing components in the system. The study is carried out with a basic kernel (matrix-matrix multiplication) and a higher level routine (LU factorization) using GPUs and the host multicore processor. Satisfactory results are obtained, with experimental execution times close to the lowest experimentally achievable
We present several algorithms to compute the solution of a linear system of equa-tions on a GPU, as ...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
In this chapter, we present a hybridization methodology for the development of linear algebra softwa...
AbstractThe introduction of auto-tuning techniques in linear algebra routines using hybrid combinati...
The use of auto-tuning techniques in a matrix multiplication routine for hybrid CPU+GPU platforms is...
The final publication is available at Springer via http://dx.doi.org/10.1007/s10766-013-0249-6The in...
This paper describes an approach for the automatic generation and optimization of numerical softwar...
Graphics hardware's performance is advancing much faster than the performance of conventional microp...
In order to utilize the tremendous computing power of grpahics hardware and to automatically adapt t...
Graphics hardware’s performance is advancing much faster than the performance of conventional microp...
The development of high performance dense linear algebra (DLA) critically depends on highly optimize...
Dense linear algebra(DLA) is one of the most seven important kernels in high performance computing. ...
AbstractThe use of an OpenMP compiler optimized for the corresponding multicore system is a good opt...
One of the main obstacles to the efficient solution of scientific problems is the problem of tuning ...
AbstractIn this work the behavior of the multithreaded implementation of some LAPACK routines on PLA...
We present several algorithms to compute the solution of a linear system of equa-tions on a GPU, as ...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
In this chapter, we present a hybridization methodology for the development of linear algebra softwa...
AbstractThe introduction of auto-tuning techniques in linear algebra routines using hybrid combinati...
The use of auto-tuning techniques in a matrix multiplication routine for hybrid CPU+GPU platforms is...
The final publication is available at Springer via http://dx.doi.org/10.1007/s10766-013-0249-6The in...
This paper describes an approach for the automatic generation and optimization of numerical softwar...
Graphics hardware's performance is advancing much faster than the performance of conventional microp...
In order to utilize the tremendous computing power of grpahics hardware and to automatically adapt t...
Graphics hardware’s performance is advancing much faster than the performance of conventional microp...
The development of high performance dense linear algebra (DLA) critically depends on highly optimize...
Dense linear algebra(DLA) is one of the most seven important kernels in high performance computing. ...
AbstractThe use of an OpenMP compiler optimized for the corresponding multicore system is a good opt...
One of the main obstacles to the efficient solution of scientific problems is the problem of tuning ...
AbstractIn this work the behavior of the multithreaded implementation of some LAPACK routines on PLA...
We present several algorithms to compute the solution of a linear system of equa-tions on a GPU, as ...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
In this chapter, we present a hybridization methodology for the development of linear algebra softwa...