We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluster based on Nvidia Fermi processors. We analyze how to optimize the algorithm for GP-GPU architectures, describe the implementation choices that we have adopted and compare our performance results with an implementation optimized for latest generation multi-core CPUs. Our program runs at ≈ 30% of the double-precision peak performance of one GPU and shows almost linear scaling when run on the multi-GPU cluster. © 2012 Springer-Verlag
AbstractWe develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for mas...
Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types ...
We describe the implementation and optimization of a state-of-the-art Lattice Boltzmann code for com...
We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluste...
We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluste...
We describe the implementation of a thermal compressible Lattice Boltzmann algorithm on an NVIDIA Te...
We describe the implementation of a thermal compressible Lattice Boltzmann algorithm on an NVIDIA T...
GPUs deliver higher performance than traditional processors, offering remarkable energy efficiency, ...
Lattice Boltzmann (LB) methods are widely used today to describe the dynamics of fluids. Key adva...
Accelerators are an increasingly common option to boost performance of codes that require extensive ...
Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic...
We develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for massively p...
AbstractWe develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for mas...
Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types ...
We describe the implementation and optimization of a state-of-the-art Lattice Boltzmann code for com...
We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluste...
We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluste...
We describe the implementation of a thermal compressible Lattice Boltzmann algorithm on an NVIDIA Te...
We describe the implementation of a thermal compressible Lattice Boltzmann algorithm on an NVIDIA T...
GPUs deliver higher performance than traditional processors, offering remarkable energy efficiency, ...
Lattice Boltzmann (LB) methods are widely used today to describe the dynamics of fluids. Key adva...
Accelerators are an increasingly common option to boost performance of codes that require extensive ...
Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic...
We develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for massively p...
AbstractWe develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for mas...
Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types ...
We describe the implementation and optimization of a state-of-the-art Lattice Boltzmann code for com...