We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of search-based performance optimizations, popular in linear algebra and FFT libraries, to application-specific computational kernels. Our work applies this strategy to a lattice Boltzmann application (LBMHD) that historically has made poor use of scalar microprocessors due to its complex data structures and memory access patterns. We explore one of the broadest sets of multicore architectures in the HPC literature, including the Intel Clovertown, AMD Opteron X2, Sun Niagara2, STI Cell, as well as the single core Intel Itanium2. Rather than hand-tuning LBMHD for each system, we develop a code generator...
In this paper we report on our early experience on porting, optimizing and benchmarking a Lattice Bo...
AbstractIn this paper we report on our early experience on porting, optimizing and benchmarking a La...
Abstract We develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for ma...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
We are witnessing a rapid evolution of HPC node architectures and on-chip parallelism as power and c...
We apply auto-tuning to a hybrid MPI-pthreads lattice Boltzmann computation running on the Cray XT4 ...
In this paper we address the problem of identifying and exploiting techniques that optimize the perf...
2012-04-27The shift to many-core architecture design paradigm in computer market has provided unprec...
When designing and implementing highly ecient scienti c applications for parallel computers such a...
We develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for massively p...
GPUs deliver higher performance than traditional processors, offering remarkable energy efficiency, ...
Algorithms with low computational intensity show interesting per-formance and power consumption beha...
AbstractWe develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for mas...
In this paper we report on our early experience on porting, optimizing and benchmarking a Lattice Bo...
AbstractIn this paper we report on our early experience on porting, optimizing and benchmarking a La...
Abstract We develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for ma...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
We are witnessing a rapid evolution of HPC node architectures and on-chip parallelism as power and c...
We apply auto-tuning to a hybrid MPI-pthreads lattice Boltzmann computation running on the Cray XT4 ...
In this paper we address the problem of identifying and exploiting techniques that optimize the perf...
2012-04-27The shift to many-core architecture design paradigm in computer market has provided unprec...
When designing and implementing highly ecient scienti c applications for parallel computers such a...
We develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for massively p...
GPUs deliver higher performance than traditional processors, offering remarkable energy efficiency, ...
Algorithms with low computational intensity show interesting per-formance and power consumption beha...
AbstractWe develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for mas...
In this paper we report on our early experience on porting, optimizing and benchmarking a Lattice Bo...
AbstractIn this paper we report on our early experience on porting, optimizing and benchmarking a La...
Abstract We develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for ma...