In order to utilize the tremendous computing power of grpahics hardware and to automatically adapt to the fast and frequent changes in its architecture and performance characteristics, this paper implements an automatic tuning system to generate high-performance matrix-multiplication implementation on graphics hardware. The automatic tun-ing system uses a parameterized code generator to gener-ate multiple versions of matrix multiplication, whose per-formances are empirically evaluated by actual execution on the target platform. An ad-hoc search engine is employed to search over the implementation space for the version that yields the best performance. In contrast to similar systems on CPUs, which utilize cache blocking, register tiling, in-...
AbstractThe introduction of auto-tuning techniques in linear algebra routines using hybrid combinati...
As computer architectures become more complex, the task of writing efficient program to best utilize...
The focus of this work is the automatic performance tuning of stencil computations on Graphics Proce...
Graphics hardware's performance is advancing much faster than the performance of conventional microp...
Graphics hardware’s performance is advancing much faster than the performance of conventional microp...
The use of auto-tuning techniques in a matrix multiplication routine for hybrid CPU+GPU platforms is...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...
Abstract. We propose a simple method to implement floating-point vector math operations and matrix m...
The development of high performance dense linear algebra (DLA) critically depends on highly optimize...
International audienceCurrent compilers cannot generate code that can compete with hand-tuned code i...
Today’s hardware platforms have parallel processing capabilities and many parallel programming model...
We propose and evaluate a novel strategy for tuning the performance of a class of stencil computatio...
For the past decade, power/energy consumption has become a limiting factor for large-scale and embed...
For the past decade, power/energy consumption has become a limiting factor for large-scale and embed...
In high-performance computing, excellent node-level performance is required for the efficient use of...
AbstractThe introduction of auto-tuning techniques in linear algebra routines using hybrid combinati...
As computer architectures become more complex, the task of writing efficient program to best utilize...
The focus of this work is the automatic performance tuning of stencil computations on Graphics Proce...
Graphics hardware's performance is advancing much faster than the performance of conventional microp...
Graphics hardware’s performance is advancing much faster than the performance of conventional microp...
The use of auto-tuning techniques in a matrix multiplication routine for hybrid CPU+GPU platforms is...
AbstractThis paper presents results of our study on double-precision general matrix-matrix multiplic...
Abstract. We propose a simple method to implement floating-point vector math operations and matrix m...
The development of high performance dense linear algebra (DLA) critically depends on highly optimize...
International audienceCurrent compilers cannot generate code that can compete with hand-tuned code i...
Today’s hardware platforms have parallel processing capabilities and many parallel programming model...
We propose and evaluate a novel strategy for tuning the performance of a class of stencil computatio...
For the past decade, power/energy consumption has become a limiting factor for large-scale and embed...
For the past decade, power/energy consumption has become a limiting factor for large-scale and embed...
In high-performance computing, excellent node-level performance is required for the efficient use of...
AbstractThe introduction of auto-tuning techniques in linear algebra routines using hybrid combinati...
As computer architectures become more complex, the task of writing efficient program to best utilize...
The focus of this work is the automatic performance tuning of stencil computations on Graphics Proce...