This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level benchmark from the NAS Parallel Benchmark Suite. An account of the design decisions addressed during the development of this code is presented, demonstrating the importance of memory arrangement and work-item/work-group distribution strategies when applications are deployed on different device types. The resulting platform-agnostic, single source application is benchmarked on a number of different architectures, and is shown to be 1.3-1.5× slower than native FORTRAN 77 or CUDA implementations on a single node and 1.3-3.1× slower on multiple nodes. We also explore the potential performance gains of OpenCL's device fissioning capability, demonstr...
Utilizing heterogeneous platforms for computation has become a general trend, making the portability...
This paper investigates the development of a molecular dynamics code that is highly portable between...
The decline of Moore’s law has led to a fundamental shift in the design of micro-processor architect...
This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level be...
Recent developments in processor architecture have settled a shift from sequential processing to par...
OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common pro...
This work describes my solution to the performance portability problem: between CPUs and GPUs in par...
The proliferation of heterogeneous computing systems presents the parallel computing community with ...
One of the benefits to programming of OpenCL is platform portability. That is, an OpenCL program tha...
The OpenCL standard allows targeting a large variety of CPU, GPU and accelerator architectures using...
In the last few years, the computing industry has changed its course from ever higher clock speeds t...
Abstract. Recently, OpenCL, a new open programming standard for GPGPU programming, has become availa...
This whitepaper investigates the parallel performance of a sample application that implements an app...
OpenCL (Open Computing Language) is a heterogeneous programming framework for developing application...
OpenCL has become the de-facto data parallel programming model for parallel devices in today’s high-...
Utilizing heterogeneous platforms for computation has become a general trend, making the portability...
This paper investigates the development of a molecular dynamics code that is highly portable between...
The decline of Moore’s law has led to a fundamental shift in the design of micro-processor architect...
This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level be...
Recent developments in processor architecture have settled a shift from sequential processing to par...
OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common pro...
This work describes my solution to the performance portability problem: between CPUs and GPUs in par...
The proliferation of heterogeneous computing systems presents the parallel computing community with ...
One of the benefits to programming of OpenCL is platform portability. That is, an OpenCL program tha...
The OpenCL standard allows targeting a large variety of CPU, GPU and accelerator architectures using...
In the last few years, the computing industry has changed its course from ever higher clock speeds t...
Abstract. Recently, OpenCL, a new open programming standard for GPGPU programming, has become availa...
This whitepaper investigates the parallel performance of a sample application that implements an app...
OpenCL (Open Computing Language) is a heterogeneous programming framework for developing application...
OpenCL has become the de-facto data parallel programming model for parallel devices in today’s high-...
Utilizing heterogeneous platforms for computation has become a general trend, making the portability...
This paper investigates the development of a molecular dynamics code that is highly portable between...
The decline of Moore’s law has led to a fundamental shift in the design of micro-processor architect...