The performance portability of OpenCL kernel implementa-tions for common memory bandwidth limited linear algebra operations across different hardware generations of the same vendor as well as across vendors is studied. Certain combi-nations of kernel implementations and work sizes are found to exhibit good performance across compute kernels, hard-ware generations, and, to a lesser degree, vendors. As a consequence, it is demonstrated that the optimization of a single kernel is often sufficient to obtain good performance for a large class of more complicated operations
Heterogeneous computing platforms are becoming increasingly important in supercomputing. Many system...
In this work, we evaluate OpenCL as a programming tool for developing performance-portable applicati...
OpenCL (Open Computing Language) is a heterogeneous programming framework for developing application...
The trend of using co-processors as accelerators to perform certain tasks is rising in the parallel...
This work describes my solution to the performance portability problem: between CPUs and GPUs in par...
Accelerator processors allow energy-efficient computation at high performance, especially for comput...
The OpenCL standard allows targeting a large variety of CPU, GPU and accelerator architectures using...
Heterogeneous computing platforms are becoming increasingly important in supercomputing. Many system...
Many computer systems now include both CPUs and programmable GPUs. OpenCL, a new programming framew...
In the last few years, the computing industry has changed its course from ever higher clock speeds t...
OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common pro...
High performance parallel computing was something exclusive for expensive specialized hardware some ...
This paper investigates the development of a molecular dynamics code that is highly portable between...
This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level be...
Recent developments in processor architecture have settled a shift from sequential processing to par...
Heterogeneous computing platforms are becoming increasingly important in supercomputing. Many system...
In this work, we evaluate OpenCL as a programming tool for developing performance-portable applicati...
OpenCL (Open Computing Language) is a heterogeneous programming framework for developing application...
The trend of using co-processors as accelerators to perform certain tasks is rising in the parallel...
This work describes my solution to the performance portability problem: between CPUs and GPUs in par...
Accelerator processors allow energy-efficient computation at high performance, especially for comput...
The OpenCL standard allows targeting a large variety of CPU, GPU and accelerator architectures using...
Heterogeneous computing platforms are becoming increasingly important in supercomputing. Many system...
Many computer systems now include both CPUs and programmable GPUs. OpenCL, a new programming framew...
In the last few years, the computing industry has changed its course from ever higher clock speeds t...
OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common pro...
High performance parallel computing was something exclusive for expensive specialized hardware some ...
This paper investigates the development of a molecular dynamics code that is highly portable between...
This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level be...
Recent developments in processor architecture have settled a shift from sequential processing to par...
Heterogeneous computing platforms are becoming increasingly important in supercomputing. Many system...
In this work, we evaluate OpenCL as a programming tool for developing performance-portable applicati...
OpenCL (Open Computing Language) is a heterogeneous programming framework for developing application...