We present preliminary results of a GPU porting of all main Gadget3 modules (gravity computation, SPH density computation, SPH hydrodynamic force, and thermal conduction) using OpenACC directives. Here we assign one GPU to each MPI rank and exploit both the host and accellerator capabilities by overlapping computations on the CPUs and GPUs: while GPUs asynchronously compute interactions between particles within their MPI ranks, CPUs perform tree-walks and MPI communications of neighbouring particles. We profile various portions of the code to understand the origin of our speedup, where we find that a peak speedup is not achieved because of time-steps with few active particles. We run a hydrodynamic cosmological simulation from the Magneticu...
We present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) codes on ...
AbstractWe present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) c...
OpenACC allows offloading calculations to accelerators such as GPGPUs using short compiler directive...
OpenACC is a directive-based programing standard aim to provide a highly portable programming model ...
The Frisch-Hasslacher-Pomeau (FHP) model is a lattice gas cellular automaton designed to simulate fl...
OpenACC, a directive-based GPU programing standard, is emerging as a promis-ing technology for massi...
Lagrangian models are fundamental tools to study atmospheric transport processes and for practical a...
An increasing number of massively-parallel supercomputers are based on heterogeneous node architectu...
For many years now, processor vendors increased the performance of their devices by adding more core...
This report highlights our work on improving GPU parallelization by supporting compute nodes with mu...
Multi-GPU implementations of the Lattice Boltzmann method are of practical interest as they allow th...
JuSPIC is a particle-in-cell (PIC) code, developed in the Simulation Lab for Plasma Physics of the J...
OpenACC compilers allow one to use Graphics Processing Units without having to write explicit CUDA c...
This 2h tutorial interactively teaches how to handle the massive computing performance offered by PO...
Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and c...
We present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) codes on ...
AbstractWe present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) c...
OpenACC allows offloading calculations to accelerators such as GPGPUs using short compiler directive...
OpenACC is a directive-based programing standard aim to provide a highly portable programming model ...
The Frisch-Hasslacher-Pomeau (FHP) model is a lattice gas cellular automaton designed to simulate fl...
OpenACC, a directive-based GPU programing standard, is emerging as a promis-ing technology for massi...
Lagrangian models are fundamental tools to study atmospheric transport processes and for practical a...
An increasing number of massively-parallel supercomputers are based on heterogeneous node architectu...
For many years now, processor vendors increased the performance of their devices by adding more core...
This report highlights our work on improving GPU parallelization by supporting compute nodes with mu...
Multi-GPU implementations of the Lattice Boltzmann method are of practical interest as they allow th...
JuSPIC is a particle-in-cell (PIC) code, developed in the Simulation Lab for Plasma Physics of the J...
OpenACC compilers allow one to use Graphics Processing Units without having to write explicit CUDA c...
This 2h tutorial interactively teaches how to handle the massive computing performance offered by PO...
Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and c...
We present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) codes on ...
AbstractWe present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) c...
OpenACC allows offloading calculations to accelerators such as GPGPUs using short compiler directive...