AbstractThis paper studies the CUDA programming challenges with using multiple GPUs inside a single machine to carry out plane-by-plane updates in parallel 3D sweeping algorithms. In particular, care must be taken to mask the overhead of various data movements between the GPUs. Multiple OpenMP threads on the CPU side should be combined multiple CUDA streams per GPU to hide the data transfer cost related to the halo computation on each 2D plane. Moreover, the technique of peer-to-peer data motion can be used to reduce the impact of 3D volumetric data shuffles that have to be done between mandatory changes of the grid partitioning. We have investigated the performance improvement of 2- and 4-GPU implementations that are applicable to 3D aniso...
This paper proposes a parallel scheme for accelerating parameter sweep applications on a graphics pr...
High performance computing using graphics processing units (GPUs) is gaining popularity in the scien...
Brief overview of the recent general tasks for parallel computation on graphics processing units is ...
AbstractThis paper studies the CUDA programming challenges with using multiple GPUs inside a single ...
This paper studies the CUDA programming challenges with using multiple GPUs inside a single machine ...
AbstractWe present a novel method for 3D anisotropic front propagation and apply it to the simulatio...
We investigate multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA para...
Graphics Processing Units (GPUs) are a fast evolving architecture. Over the last decade their progra...
The research presented in this thesis investigates parallel implementations of the Fast Sweeping Met...
AbstractAs one of the arbitrary Lagrangian–Eulerian methods, the material point method (MPM) owns in...
Graphical Processing Unit (GPU) provides a significant amount of computation power that can be used ...
International audienceStochastic simulations involve multiple replications in order to build confide...
Static non-linear Hamilton-Jacobi equations are often used to describe a propagating front. Advanced...
Computers almost always contain one or more central processing units (CPU), each of which processes ...
Achieving maximum parallel performance on multi-core CPUs and many-core GPUs is a challenging task d...
This paper proposes a parallel scheme for accelerating parameter sweep applications on a graphics pr...
High performance computing using graphics processing units (GPUs) is gaining popularity in the scien...
Brief overview of the recent general tasks for parallel computation on graphics processing units is ...
AbstractThis paper studies the CUDA programming challenges with using multiple GPUs inside a single ...
This paper studies the CUDA programming challenges with using multiple GPUs inside a single machine ...
AbstractWe present a novel method for 3D anisotropic front propagation and apply it to the simulatio...
We investigate multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA para...
Graphics Processing Units (GPUs) are a fast evolving architecture. Over the last decade their progra...
The research presented in this thesis investigates parallel implementations of the Fast Sweeping Met...
AbstractAs one of the arbitrary Lagrangian–Eulerian methods, the material point method (MPM) owns in...
Graphical Processing Unit (GPU) provides a significant amount of computation power that can be used ...
International audienceStochastic simulations involve multiple replications in order to build confide...
Static non-linear Hamilton-Jacobi equations are often used to describe a propagating front. Advanced...
Computers almost always contain one or more central processing units (CPU), each of which processes ...
Achieving maximum parallel performance on multi-core CPUs and many-core GPUs is a challenging task d...
This paper proposes a parallel scheme for accelerating parameter sweep applications on a graphics pr...
High performance computing using graphics processing units (GPUs) is gaining popularity in the scien...
Brief overview of the recent general tasks for parallel computation on graphics processing units is ...