This is the author accepted manuscript. The final version is available from MDPI via the DOI in this record.Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types of computing devices, for instance CPUs and GPUs, providing a huge computational potential. Programming them in a scalable way exploiting the maximal performance introduces numerous challenges such as optimizations for different computing devices, dealing with multiple levels of parallelism, the application of different programming models, work distribution, and hiding of communication with computation. We utilize the lattice Boltzmann method for fluid flow as a representative of a scientific computing application and develop a ho...
Accelerators are an increasingly common option to boost performance of codes that require extensive ...
Abstract In this paper we report on our early experience on porting, optimizing and benchmarking a...
Today, we are living a growing demand of larger and more efficient computational resources from the ...
Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types ...
This paper describes a massively parallel code for a state-of-the art thermal lattice–Boltzmann meth...
GPUs deliver higher performance than traditional processors, offering remarkable energy efficiency, ...
National audienceThe popularization of graphic processing units (GPUs) has led to their extensive us...
With computer simulations real world phenomena can be analyzed in great detail. Computational fluid ...
Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic...
International audienceThe lattice Boltzmann method (LBM) is an innovative and promising approach in ...
We propose a numerical approach based on the Lattice-Boltzmann (LBM) and Immersed Boundary (IB) meth...
We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluste...
Lattice Boltzmann (LB) methods are widely used today to describe the dynamics of fluids. Key adva...
We present a software approach to hardware-oriented numerics which builds upon an augmented, previou...
In this paper we address the problem of identifying and exploiting techniques that optimize the perf...
Accelerators are an increasingly common option to boost performance of codes that require extensive ...
Abstract In this paper we report on our early experience on porting, optimizing and benchmarking a...
Today, we are living a growing demand of larger and more efficient computational resources from the ...
Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types ...
This paper describes a massively parallel code for a state-of-the art thermal lattice–Boltzmann meth...
GPUs deliver higher performance than traditional processors, offering remarkable energy efficiency, ...
National audienceThe popularization of graphic processing units (GPUs) has led to their extensive us...
With computer simulations real world phenomena can be analyzed in great detail. Computational fluid ...
Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic...
International audienceThe lattice Boltzmann method (LBM) is an innovative and promising approach in ...
We propose a numerical approach based on the Lattice-Boltzmann (LBM) and Immersed Boundary (IB) meth...
We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluste...
Lattice Boltzmann (LB) methods are widely used today to describe the dynamics of fluids. Key adva...
We present a software approach to hardware-oriented numerics which builds upon an augmented, previou...
In this paper we address the problem of identifying and exploiting techniques that optimize the perf...
Accelerators are an increasingly common option to boost performance of codes that require extensive ...
Abstract In this paper we report on our early experience on porting, optimizing and benchmarking a...
Today, we are living a growing demand of larger and more efficient computational resources from the ...