Despite the growing popularity of GPGPU programming, there is not yet a portable and formally-specified barrier that one can use to synchronise across workgroups. Moreover, the occupancy-bound execution model of GPUs breaks assumptions inherent in traditional software execution barriers, exposing them to deadlock. We present an occupancy discovery protocol that dynamically discovers a safe estimate of the occupancy for a given GPU and kernel, allowing for a starvation-free (and hence, deadlock-free) inter-workgroup barrier by restricting the number of workgroups according to this estimate. We implement this idea by adapting an existing, previously non-portable, GPU inter-workgroup barrier to use OpenCL 2.0 atomic operations, and prove that ...
We present BifurKTM, the first read-optimized Distributed Transactional Memory system for GPU cluste...
Each new generation of GPUs vastly increases the resources avail-able to GPGPU programs. GPU program...
This article presents a GPU-based single-unit deadlock detection methodology and its algorithm, GPU-...
Despite the growing popularity of GPGPU programming, there is not yet a portable and formally-specif...
GPUs are parallel devices that are able to run thousands of independent threads concurrently. Tradi...
There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorith...
As GPU availability has increased and programming support has matured, a wider variety of applicatio...
The graphics processing unit (GPU) has evolved from a fixed-function processor with programmable stag...
The Graphics Processing Unit (GPU) has become a mainstream computing platform for a wide range of ap...
Blocking synchronisation idioms, e.g. mutexes and barriers, play an important role in concurrent pro...
Blocking synchronisation idioms, e.g. mutexes and barriers, play an important role in concurrent pro...
Graphics processing units (GPUs) are becoming increasingly important in today's platforms as their g...
In this dissertation, we explore multiple designs for a Distributed Transactional Memory framework f...
GPUs (Graphics Processing Units) employ a multi-threaded execution model using multiple SIMD cores. ...
As modern GPU workloads become larger and more complex, there is an ever-increasing demand for GPU c...
We present BifurKTM, the first read-optimized Distributed Transactional Memory system for GPU cluste...
Each new generation of GPUs vastly increases the resources avail-able to GPGPU programs. GPU program...
This article presents a GPU-based single-unit deadlock detection methodology and its algorithm, GPU-...
Despite the growing popularity of GPGPU programming, there is not yet a portable and formally-specif...
GPUs are parallel devices that are able to run thousands of independent threads concurrently. Tradi...
There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorith...
As GPU availability has increased and programming support has matured, a wider variety of applicatio...
The graphics processing unit (GPU) has evolved from a fixed-function processor with programmable stag...
The Graphics Processing Unit (GPU) has become a mainstream computing platform for a wide range of ap...
Blocking synchronisation idioms, e.g. mutexes and barriers, play an important role in concurrent pro...
Blocking synchronisation idioms, e.g. mutexes and barriers, play an important role in concurrent pro...
Graphics processing units (GPUs) are becoming increasingly important in today's platforms as their g...
In this dissertation, we explore multiple designs for a Distributed Transactional Memory framework f...
GPUs (Graphics Processing Units) employ a multi-threaded execution model using multiple SIMD cores. ...
As modern GPU workloads become larger and more complex, there is an ever-increasing demand for GPU c...
We present BifurKTM, the first read-optimized Distributed Transactional Memory system for GPU cluste...
Each new generation of GPUs vastly increases the resources avail-able to GPGPU programs. GPU program...
This article presents a GPU-based single-unit deadlock detection methodology and its algorithm, GPU-...