We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluster based on Nvidia Fermi processors. We analyze how to optimize the algorithm for GP-GPU architectures, describe the implementation choices that we have adopted and compare our performance results with an implementation optimized for latest generation multi-core CPUs. Our program runs at ˜¿30% of the double-precision peak performance of one GPU and shows almost linear scaling when run on the multi-GPU cluster. Keywords: Computational fluid-dynamics – Lattice Boltzmann methods – GP-GPUs computin
We develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for massively p...
We describe the implementation and optimization of a state-of-the-art Lattice Boltzmann code for com...
Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types ...
We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluste...
We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluste...
We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluste...
We describe the implementation of a thermal compressible Lattice Boltzmann algorithm on an NVIDIA Te...
We describe the implementation of a thermal compressible Lattice Boltzmann algorithm on an NVIDIA T...
Accelerators are an increasingly common option to boost performance of codes that require extensive ...
Lattice Boltzmann (LB) methods are widely used today to describe the dynamics of fluids. Key adva...
Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic...
GPUs deliver higher performance than traditional processors, offering remarkable energy efficiency, ...
The lattice Boltzmann method has become a valuable tool in computational fluid dynamics, one of the ...
We develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for massively p...
We describe the implementation and optimization of a state-of-the-art Lattice Boltzmann code for com...
Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types ...
We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluste...
We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluste...
We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluste...
We describe the implementation of a thermal compressible Lattice Boltzmann algorithm on an NVIDIA Te...
We describe the implementation of a thermal compressible Lattice Boltzmann algorithm on an NVIDIA T...
Accelerators are an increasingly common option to boost performance of codes that require extensive ...
Lattice Boltzmann (LB) methods are widely used today to describe the dynamics of fluids. Key adva...
Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic...
GPUs deliver higher performance than traditional processors, offering remarkable energy efficiency, ...
The lattice Boltzmann method has become a valuable tool in computational fluid dynamics, one of the ...
We develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for massively p...
We describe the implementation and optimization of a state-of-the-art Lattice Boltzmann code for com...
Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types ...