We are witnessing a rapid evolution of HPC node architectures and on-chip parallelism as power and cooling constraints limit increases in microprocessor clock speeds. In this work, we demonstrate a hierarchical approach towards effectively extracting performance for a variety of emerging multicore-based supercomputing platforms. Our examined application is a structured grid-based Lattice Boltzmann computation that simulates homogeneous isotropic turbulence in magnetohydrodynamics. First, we examine sophisticated sequential auto-tuning techniques including loop transformations, virtual vectorization, and use of ISA-specific intrinsics. Next, we present a variety of parallel optimization approaches including programming model exploration (at ...
GPUs deliver higher performance than traditional processors, offering remarkable energy efficiency, ...
When designing and implementing highly efficient scientific applications for parallel computers such...
This thesis presents efforts to attain efficient Lattice Boltzmann simulations on large-scale parall...
We are witnessing a rapid evolution of HPC node architectures and on-chip parallelism as power and c...
We apply auto-tuning to a hybrid MPI-pthreads lattice Boltzmann computation running on the Cray XT4 ...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
In this paper we address the problem of identifying and exploiting techniques that optimize the perf...
Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types ...
We describe the implementation and optimization of a state-of-the-art Lattice Boltzmann code for com...
Abstract We develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for ma...
We develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for massively p...
AbstractWe develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for mas...
Abstract—The lattice Boltzmann method is increasingly im-portant in facilitating large-scale fluid d...
The last decade has witnessed a rapid proliferation of superscalarcache-based microprocessors to bui...
GPUs deliver higher performance than traditional processors, offering remarkable energy efficiency, ...
When designing and implementing highly efficient scientific applications for parallel computers such...
This thesis presents efforts to attain efficient Lattice Boltzmann simulations on large-scale parall...
We are witnessing a rapid evolution of HPC node architectures and on-chip parallelism as power and c...
We apply auto-tuning to a hybrid MPI-pthreads lattice Boltzmann computation running on the Cray XT4 ...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
In this paper we address the problem of identifying and exploiting techniques that optimize the perf...
Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types ...
We describe the implementation and optimization of a state-of-the-art Lattice Boltzmann code for com...
Abstract We develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for ma...
We develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for massively p...
AbstractWe develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for mas...
Abstract—The lattice Boltzmann method is increasingly im-portant in facilitating large-scale fluid d...
The last decade has witnessed a rapid proliferation of superscalarcache-based microprocessors to bui...
GPUs deliver higher performance than traditional processors, offering remarkable energy efficiency, ...
When designing and implementing highly efficient scientific applications for parallel computers such...
This thesis presents efforts to attain efficient Lattice Boltzmann simulations on large-scale parall...