We present initial comparison performance results for Intel many integrated core (MIC), Sandy Bridge (SB), and graphical processing unit (GPU). A 1D explicit electrostatic particle-in-cell code is used to simulate a two-stream instability in plasma. We compare the computation times for various number of cores/threads and compiler options. The parallelization is implemented via OpenMP with a maximum thread number of 128. Parallelization and vectorization on the GPU is achieved with modifying the code syntax for compatibility with CUDA. We assess the speedup due to various auto-vectorization and optimization level compiler options. Our results show that the MIC is several times slower than SB for a single thread, and it becomes faster than SB...
e have developed a new algorithm for implementation of plasma particle-in-cell (PIC) simulation code...
Particle-in-cell (PIC) simulations are some of the most computationally intensive calcula-tions carr...
The Monte Carlo neutron transport method can be naturally parallelized by multi-core architectures d...
AbstractWe have designed Particle-in-Cell algorithms for emerging architectures. These algorithms sh...
Three dimensional particle-in-cell laser-plasma simulation is an important area of computational phy...
AbstractThe computational performance of a smoothed particle hydrodynamics (SPH) simulation is inves...
This thesis discusses how to optimize computational physics software for speed through maximizing th...
JuSPIC is a particle-in-cell (PIC) code, developed in the Simulation Lab for Plasma Physics of the J...
This paper reports on an in-depth evaluation of the performance portability frameworks Kokkos and RA...
Shared memory multi-core processor technology has seen a drastic developmentwith faster and increasi...
The emergence of modern many-core architectures that offer an extreme level of parallelism makes met...
A serial source code for simulating a supersonic ejector flow is accelerated using parallelization b...
The computational effort of 3D image reconstruction in Computed Tomography (CT) has required special...
Elegant is an accelerator physics and particle-beam dynamics code widely used for modeling and desig...
Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and c...
e have developed a new algorithm for implementation of plasma particle-in-cell (PIC) simulation code...
Particle-in-cell (PIC) simulations are some of the most computationally intensive calcula-tions carr...
The Monte Carlo neutron transport method can be naturally parallelized by multi-core architectures d...
AbstractWe have designed Particle-in-Cell algorithms for emerging architectures. These algorithms sh...
Three dimensional particle-in-cell laser-plasma simulation is an important area of computational phy...
AbstractThe computational performance of a smoothed particle hydrodynamics (SPH) simulation is inves...
This thesis discusses how to optimize computational physics software for speed through maximizing th...
JuSPIC is a particle-in-cell (PIC) code, developed in the Simulation Lab for Plasma Physics of the J...
This paper reports on an in-depth evaluation of the performance portability frameworks Kokkos and RA...
Shared memory multi-core processor technology has seen a drastic developmentwith faster and increasi...
The emergence of modern many-core architectures that offer an extreme level of parallelism makes met...
A serial source code for simulating a supersonic ejector flow is accelerated using parallelization b...
The computational effort of 3D image reconstruction in Computed Tomography (CT) has required special...
Elegant is an accelerator physics and particle-beam dynamics code widely used for modeling and desig...
Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and c...
e have developed a new algorithm for implementation of plasma particle-in-cell (PIC) simulation code...
Particle-in-cell (PIC) simulations are some of the most computationally intensive calcula-tions carr...
The Monte Carlo neutron transport method can be naturally parallelized by multi-core architectures d...