This work analyzes the most advanced features of the Kepler GPU by Nvidia, mainly dynamic parallelism for launching kernels internally from the GPU and thread scheduling via Hyper-Q. We illustrate several ways to exploit those features from a code which computes Zernike moments, using two different formulations: direct and iterative. This way, we compare how well they can deploy parallelism on the new generation of GPUs. The direct alternative tries to maximize parallelism, while the iterative one increases the operational intensity by reusing results coming from previous iterations. This has allowed us to increase the speed-up factor attained on Fermi architectures versus a code written in C and executed on a multicore CPU. We also succeed...
The lag of parallel programming models and languages behind the advance of heterogeneous many-core p...
Abstract—This work aims to enable Swift to efficiently use accelerators (such as NVIDIA GPUs) to fur...
In our previous work, we have provided tools for an efficient characterization of biomedical images ...
This work analyzes the most advanced features of the Kepler GPU by Nvidia, mainly dynamic parallelis...
Since the first version of CUDA was launch, many improvements were made in GPU computing. Every new ...
AbstractWe optimized Moving Particle Simulation (MPS) method for Kepler GPU. Solving sparse matrix o...
The programming of GPUs (Graphics Processing Units) is ready for practical applications; the largest...
Just five years ago, NVIDIA introduced CUDA, the Compute Unified Device Architecture, which signifi...
Aiming to understand how high-performance CUDA programming can be done for NVIDIA's new Kepler archi...
Each new generation of GPUs vastly increases the resources available to GPGPU programs. GPU programm...
In this article, we describe an improved cell-list approach designed to match the Kepler architectur...
Each new generation of GPUs vastly increases the resources avail-able to GPGPU programs. GPU program...
The effective parallelization of applications exhibiting irregular nested parallelism is still an op...
Maintaining computational load balance is important to the performant behavior of codes which operat...
Heterogeneous computing nodes are now pervasive throughout computing, and GPUs have emerged as a lea...
The lag of parallel programming models and languages behind the advance of heterogeneous many-core p...
Abstract—This work aims to enable Swift to efficiently use accelerators (such as NVIDIA GPUs) to fur...
In our previous work, we have provided tools for an efficient characterization of biomedical images ...
This work analyzes the most advanced features of the Kepler GPU by Nvidia, mainly dynamic parallelis...
Since the first version of CUDA was launch, many improvements were made in GPU computing. Every new ...
AbstractWe optimized Moving Particle Simulation (MPS) method for Kepler GPU. Solving sparse matrix o...
The programming of GPUs (Graphics Processing Units) is ready for practical applications; the largest...
Just five years ago, NVIDIA introduced CUDA, the Compute Unified Device Architecture, which signifi...
Aiming to understand how high-performance CUDA programming can be done for NVIDIA's new Kepler archi...
Each new generation of GPUs vastly increases the resources available to GPGPU programs. GPU programm...
In this article, we describe an improved cell-list approach designed to match the Kepler architectur...
Each new generation of GPUs vastly increases the resources avail-able to GPGPU programs. GPU program...
The effective parallelization of applications exhibiting irregular nested parallelism is still an op...
Maintaining computational load balance is important to the performant behavior of codes which operat...
Heterogeneous computing nodes are now pervasive throughout computing, and GPUs have emerged as a lea...
The lag of parallel programming models and languages behind the advance of heterogeneous many-core p...
Abstract—This work aims to enable Swift to efficiently use accelerators (such as NVIDIA GPUs) to fur...
In our previous work, we have provided tools for an efficient characterization of biomedical images ...