Unstructured-mesh based numerical algorithms such as finite volume and finite element algorithms form an important class of applications for many scientific and engineering domains. The key difficulty in achieving higher performance from these applications is the indirect accesses that lead to data-races when parallelized. Current methods for handling such data-races lead to reduced parallelism and sub-optimal performance. Particularly on modern many-core architectures, such as GPUs, that has increasing core/thread counts, reducing data movement and exploiting memory locality is vital for gaining good performance.In this work we present novel locality-exploiting optimizations for the efficient execution of unstructured-mesh algorithms on GP...
The massive parallelism provided by general-purpose GPUs (GPGPUs) possessing numerous compute thread...
General-purpose computing on GPUs is widely adopted for scientific applications, providing inexpensi...
International audienceIn this work, we investigate the global memory access mech- anism on recent GP...
Unstructured-mesh based numerical algorithms such as finite volume and finite element algorithms for...
AbstractThis paper addresses two key parallelization challenges the unstructured mesh-based ocean mo...
Programming models such as CUDA and OpenCL allow the programmer to specify the independence of threa...
Abstract The present work investigates the feasibility of finite element methods and topology optimi...
Abstract In unstructured finite volume method, loop on different mesh components such as cells, face...
Many numerical optimisation problems rely on fast algorithms for solving sparse triangular systems o...
Graphical processing units (GPUs) have recently attracted attention for scientific applications such...
This work presents a parallel implementation of density-based topology optimization using distribute...
Achieving optimal performance on the latest multi-core and many-core architectures increasingly depe...
In the last three years, GPUs are more and more being used for general purpose applications instead ...
The present work investigates the feasibility of nite element methods and topology optimization for...
Many real-life applications of processor-arrays suffer from memory bandwidth limitations. In many ca...
The massive parallelism provided by general-purpose GPUs (GPGPUs) possessing numerous compute thread...
General-purpose computing on GPUs is widely adopted for scientific applications, providing inexpensi...
International audienceIn this work, we investigate the global memory access mech- anism on recent GP...
Unstructured-mesh based numerical algorithms such as finite volume and finite element algorithms for...
AbstractThis paper addresses two key parallelization challenges the unstructured mesh-based ocean mo...
Programming models such as CUDA and OpenCL allow the programmer to specify the independence of threa...
Abstract The present work investigates the feasibility of finite element methods and topology optimi...
Abstract In unstructured finite volume method, loop on different mesh components such as cells, face...
Many numerical optimisation problems rely on fast algorithms for solving sparse triangular systems o...
Graphical processing units (GPUs) have recently attracted attention for scientific applications such...
This work presents a parallel implementation of density-based topology optimization using distribute...
Achieving optimal performance on the latest multi-core and many-core architectures increasingly depe...
In the last three years, GPUs are more and more being used for general purpose applications instead ...
The present work investigates the feasibility of nite element methods and topology optimization for...
Many real-life applications of processor-arrays suffer from memory bandwidth limitations. In many ca...
The massive parallelism provided by general-purpose GPUs (GPGPUs) possessing numerous compute thread...
General-purpose computing on GPUs is widely adopted for scientific applications, providing inexpensi...
International audienceIn this work, we investigate the global memory access mech- anism on recent GP...