In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on three different architectures, SMP, Cell/B.E. and GPUs, showing the wide usefulness of the approach. The evaluation is done with four different benchmarks, Matrix Multiply, BlackScholes, Perlin Noise, and Julia Set. We compare the results obtained with the execution of the same benchmarks written in OpenCL, in the same architectures. The results show that OMPSs greatly outperforms the OpenCL environment. It is more flexible to exploit multiple accelerators. And due to the simplicity of the annotations, it increases programmer’s productivity.Peer Reviewe
In this work, we evaluate OpenCL as a programming tool for developing performance-portable applicati...
Application development for modern high-performance systems with many cores, i.e., comprising multip...
Graphics Processing Units (GPU) have been widely adopted to accelerate the execution of HPC workload...
In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incor...
CUDA and OpenCL are the most widely used programming models to exploit hardware accelerators. Both p...
The advent of heterogeneous computing has forced programmers to use platform specific programming pa...
Shared memory multicore processor technology is pervasive in mainstream computing. This new architec...
With heterogeneous computing becoming mainstream, researchers and software vendors have been trying ...
This paper advances the state-of-the-art in programming models for exploiting task-level parallelis...
This paper advances the state-of-the-art in programming models for exploiting task-level parallelism...
This paper presents the OmpSs approach to deal with heterogeneous programming on GPU and FPGA accele...
Shared memory multi-core processor technology has seen a drastic developmentwith faster and increasi...
OpenMP [13] is the dominant programming model for shared-memory parallelism in C, C++ and Fortran du...
Recent developments in processor architecture have settled a shift from sequential processing to par...
Abstract. Shared memory multicore processor technology is pervasive in mainstream computing. This ne...
In this work, we evaluate OpenCL as a programming tool for developing performance-portable applicati...
Application development for modern high-performance systems with many cores, i.e., comprising multip...
Graphics Processing Units (GPU) have been widely adopted to accelerate the execution of HPC workload...
In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incor...
CUDA and OpenCL are the most widely used programming models to exploit hardware accelerators. Both p...
The advent of heterogeneous computing has forced programmers to use platform specific programming pa...
Shared memory multicore processor technology is pervasive in mainstream computing. This new architec...
With heterogeneous computing becoming mainstream, researchers and software vendors have been trying ...
This paper advances the state-of-the-art in programming models for exploiting task-level parallelis...
This paper advances the state-of-the-art in programming models for exploiting task-level parallelism...
This paper presents the OmpSs approach to deal with heterogeneous programming on GPU and FPGA accele...
Shared memory multi-core processor technology has seen a drastic developmentwith faster and increasi...
OpenMP [13] is the dominant programming model for shared-memory parallelism in C, C++ and Fortran du...
Recent developments in processor architecture have settled a shift from sequential processing to par...
Abstract. Shared memory multicore processor technology is pervasive in mainstream computing. This ne...
In this work, we evaluate OpenCL as a programming tool for developing performance-portable applicati...
Application development for modern high-performance systems with many cores, i.e., comprising multip...
Graphics Processing Units (GPU) have been widely adopted to accelerate the execution of HPC workload...