AbstractThe architecture of high performance computing systems is becoming more and more heterogeneous, as accelerators play an increasingly important role alongside traditional CPUs. Programming heterogeneous systems efficiently is a complex task, that often requires the use of specific programming environments. Programming frameworks supporting codes portable across different high performance architectures have recently appeared, but one must carefully assess the relative costs of portability versus computing efficiency, and find a reasonable tradeoff point. In this paper we address precisely this issue, using as test-bench a Lattice Boltzmann code implemented in OpenCL. We analyze its performance on several different state-of-the-art pro...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
Abstract In this paper we report on our early experience on porting, optimizing and benchmarking a...
We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluste...
Abstract The architecture of high performance computing systems is becoming more and more heterogene...
AbstractThe architecture of high performance computing systems is becoming more and more heterogeneo...
Scientific computing community has been in close connection with high performance computing (HPC), ...
GPUs deliver higher performance than traditional processors, offering remarkable energy efficiency, ...
In this paper we address the problem of identifying and exploiting techniques that optimize the perf...
Lattice Boltzmann Methods (LBM) are an established mesoscopic approach for simulating a wide variety...
In this paper we report on our early experience on porting, optimizing and benchmarking a Lattice Bo...
With computer simulations real world phenomena can be analyzed in great detail. Computational fluid ...
Accelerators are quickly emerging as the leading technology to further boost computing performances;...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types ...
AbstractIn this paper we report on our early experience on porting, optimizing and benchmarking a La...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
Abstract In this paper we report on our early experience on porting, optimizing and benchmarking a...
We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluste...
Abstract The architecture of high performance computing systems is becoming more and more heterogene...
AbstractThe architecture of high performance computing systems is becoming more and more heterogeneo...
Scientific computing community has been in close connection with high performance computing (HPC), ...
GPUs deliver higher performance than traditional processors, offering remarkable energy efficiency, ...
In this paper we address the problem of identifying and exploiting techniques that optimize the perf...
Lattice Boltzmann Methods (LBM) are an established mesoscopic approach for simulating a wide variety...
In this paper we report on our early experience on porting, optimizing and benchmarking a Lattice Bo...
With computer simulations real world phenomena can be analyzed in great detail. Computational fluid ...
Accelerators are quickly emerging as the leading technology to further boost computing performances;...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types ...
AbstractIn this paper we report on our early experience on porting, optimizing and benchmarking a La...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
Abstract In this paper we report on our early experience on porting, optimizing and benchmarking a...
We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluste...