We identify and show how to overcome an OpenMP bottleneck in the administration of GPU memory. It arises for a wave equation solver on dynamically adaptive block-structured Cartesian meshes, which keeps all CPU threads busy and allows all of them to offload sets of patches to the GPU. Our studies show that multithreaded, concurrent, non-deterministic access to the GPU leads to performance breakdowns, since the GPU memory bookkeeping as offered through OpenMP’s map clause, i.e., the allocation and freeing, becomes another runtime challenge besides expensive data transfer and actual computation. We, therefore, propose to retain the memory management responsibility on the host: A caching mechanism acquires memory on the accelerator for all CPU...
We investigate multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA para...
With the introduction of more powerful and massively parallel embedded processors, embedded systems ...
This paper presents GPU parallelization for a computational fluid dynamics solver which works on a m...
We identify and show how to overcome an OpenMP bottleneck in the administration of GPU memory. It ar...
Modern supercomputers rely on accelerators to speed up highly parallel workloads. Intricate programm...
Graphics processing units (GPUs) have become prevalent in modern computing systems. While their high...
Graphics Processing Units (GPU) have been widely adopted to accelerate the execution of HPC workload...
GPU devices are becoming a common element in current HPC platforms due to their high performance-per...
Multi-GPU machines are being increasingly used in high performance computing. These machines are bei...
High compute-density with massive thread-level parallelism of Graphics Processing Units (GPUs) is be...
A block-structured adaptive mesh refinement (AMR) technique has been used to obtain numerical soluti...
abstract: With the massive multithreading execution feature, graphics processing units (GPUs) have b...
his paper reports on the successful implementation of a massively parallel GPU-accelerated algorithm...
Heterogeneous systems equipped with traditional processors (CPUs) and graphics processing units (GPU...
Unstructured-mesh based numerical algorithms such as finite volume and finite element algorithms for...
We investigate multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA para...
With the introduction of more powerful and massively parallel embedded processors, embedded systems ...
This paper presents GPU parallelization for a computational fluid dynamics solver which works on a m...
We identify and show how to overcome an OpenMP bottleneck in the administration of GPU memory. It ar...
Modern supercomputers rely on accelerators to speed up highly parallel workloads. Intricate programm...
Graphics processing units (GPUs) have become prevalent in modern computing systems. While their high...
Graphics Processing Units (GPU) have been widely adopted to accelerate the execution of HPC workload...
GPU devices are becoming a common element in current HPC platforms due to their high performance-per...
Multi-GPU machines are being increasingly used in high performance computing. These machines are bei...
High compute-density with massive thread-level parallelism of Graphics Processing Units (GPUs) is be...
A block-structured adaptive mesh refinement (AMR) technique has been used to obtain numerical soluti...
abstract: With the massive multithreading execution feature, graphics processing units (GPUs) have b...
his paper reports on the successful implementation of a massively parallel GPU-accelerated algorithm...
Heterogeneous systems equipped with traditional processors (CPUs) and graphics processing units (GPU...
Unstructured-mesh based numerical algorithms such as finite volume and finite element algorithms for...
We investigate multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA para...
With the introduction of more powerful and massively parallel embedded processors, embedded systems ...
This paper presents GPU parallelization for a computational fluid dynamics solver which works on a m...