The objective of this thesis is the development, implementation and optimization of a GPU execution model extension that efficiently supports time-varying, nested, fine-grained dynamic parallelism occurring in the irregular data intensive applications. These dynamically formed pockets of structured parallelism can utilize the recently introduced device-side nested kernel launch capabilities on GPUs. However, the low utilization of GPU resources and the high cost of the device kernel launch make it still difficult to harness dynamic parallelism on GPUs. This thesis then presents an extension to the common Bulk Synchronous Parallel (BSP) GPU execution model -- Dynamic Thread Block Launch (DTBL), which provides the capability of spawning li...
Heterogeneous computing nodes are now pervasive throughout computing, and GPUs have emerged as a lea...
Future high-performance computing systems will be hybrid; they will include processors optimized for...
In this paper, we characterize and analyze an increasingly popular style of programming for the GPU ...
Thread parallel hardware, as the Graphics Processing Units (GPUs), greatly outperform CPUs in provid...
As modern GPU workloads become larger and more complex, there is an ever-increasing demand for GPU c...
Supporting dynamic parallelism is important for GPU to benefit a broad range of applications. There ...
Many applications with regular parallelism have been shown to benefit from using Graphics Processing...
Power-performance efficiency has become a central focus that is challenging in heterogeneous process...
In recent processor development, we have witnessed the in-tegration of GPU and CPUs into a single ch...
Early programs for GPU (Graphics Processing Units) acceleration were based on a flat, bulk parallel ...
GPU devices are becoming a common element in current HPC platforms due to their high performance-per...
The massive parallelism provided by general-purpose GPUs (GPGPUs) possessing numerous compute thread...
As the complexity of applications continues to grow, each new generation of GPUs has been equipped w...
Abstract—GPUs have gained tremendous popularity in a broad range of application domains. These appli...
The effective use of GPUs for accelerating applications depends on a number of factors including eff...
Heterogeneous computing nodes are now pervasive throughout computing, and GPUs have emerged as a lea...
Future high-performance computing systems will be hybrid; they will include processors optimized for...
In this paper, we characterize and analyze an increasingly popular style of programming for the GPU ...
Thread parallel hardware, as the Graphics Processing Units (GPUs), greatly outperform CPUs in provid...
As modern GPU workloads become larger and more complex, there is an ever-increasing demand for GPU c...
Supporting dynamic parallelism is important for GPU to benefit a broad range of applications. There ...
Many applications with regular parallelism have been shown to benefit from using Graphics Processing...
Power-performance efficiency has become a central focus that is challenging in heterogeneous process...
In recent processor development, we have witnessed the in-tegration of GPU and CPUs into a single ch...
Early programs for GPU (Graphics Processing Units) acceleration were based on a flat, bulk parallel ...
GPU devices are becoming a common element in current HPC platforms due to their high performance-per...
The massive parallelism provided by general-purpose GPUs (GPGPUs) possessing numerous compute thread...
As the complexity of applications continues to grow, each new generation of GPUs has been equipped w...
Abstract—GPUs have gained tremendous popularity in a broad range of application domains. These appli...
The effective use of GPUs for accelerating applications depends on a number of factors including eff...
Heterogeneous computing nodes are now pervasive throughout computing, and GPUs have emerged as a lea...
Future high-performance computing systems will be hybrid; they will include processors optimized for...
In this paper, we characterize and analyze an increasingly popular style of programming for the GPU ...