Abstract—GPUs are increasingly used as compute accelera-tors. With a large number of cores executing an even larger number of threads, significant speed-ups can be attained for parallel workloads. Applications that rely on atomic operations, such as histogram and Hough transform, suffer from serialization of threads in case they update the same memory location. Previous work shows that reducing this serialization with software techniques can increase performance by an order of magnitude. We observe, however, that some serialization remains and still slows down these applications. Therefore, this paper proposes to use a hash function in both the addressing of the banks and the locks of the scratchpad memory. To measure the effects of these c...
Heterogeneous processors, consisting of CPU cores and an integrated GPU on the same die, are current...
The massive parallelism provided by general-purpose GPUs (GPGPUs) possessing numerous compute thread...
Massively parallel processing devices, like Graphics Processing Units (GPUs), have the ability to ac...
GPUs are increasingly used as compute accelerators. With a large number of cores executing an even l...
General-Purpose Graphics Processing Unit (GPGPU) applications exploit on-chip scratchpad memory avai...
Graphics Processing Units (GPUs) have become the accelerator of choice for data-parallel application...
Scratchpad memories in GPU architectures are employed as software-controlled caches to increase the ...
The parallelization process for a sequential applications involves handling of concurrent shared mem...
Graphics processor units (GPUs) are designed to efficiently exploit thread level parallelism (TLP), ...
Many applications with regular parallelism have been shown to benefit from using Graphics Processing...
Graphics processing units (GPUs) have become prevalent in modern computing systems. While their high...
Graphics Processing Units (GPUs) are growing increasingly popular as general purpose compute acceler...
In the multi-core CPU world, transactional memory (TM)has emerged as an alternative to lock-based pr...
During the last years Field Programmable Gate Arrays and Graphics Processing Units have become incre...
Graphics Processing Units (GPUs) are popular hardware accelerators for data-parallel applications, e...
Heterogeneous processors, consisting of CPU cores and an integrated GPU on the same die, are current...
The massive parallelism provided by general-purpose GPUs (GPGPUs) possessing numerous compute thread...
Massively parallel processing devices, like Graphics Processing Units (GPUs), have the ability to ac...
GPUs are increasingly used as compute accelerators. With a large number of cores executing an even l...
General-Purpose Graphics Processing Unit (GPGPU) applications exploit on-chip scratchpad memory avai...
Graphics Processing Units (GPUs) have become the accelerator of choice for data-parallel application...
Scratchpad memories in GPU architectures are employed as software-controlled caches to increase the ...
The parallelization process for a sequential applications involves handling of concurrent shared mem...
Graphics processor units (GPUs) are designed to efficiently exploit thread level parallelism (TLP), ...
Many applications with regular parallelism have been shown to benefit from using Graphics Processing...
Graphics processing units (GPUs) have become prevalent in modern computing systems. While their high...
Graphics Processing Units (GPUs) are growing increasingly popular as general purpose compute acceler...
In the multi-core CPU world, transactional memory (TM)has emerged as an alternative to lock-based pr...
During the last years Field Programmable Gate Arrays and Graphics Processing Units have become incre...
Graphics Processing Units (GPUs) are popular hardware accelerators for data-parallel applications, e...
Heterogeneous processors, consisting of CPU cores and an integrated GPU on the same die, are current...
The massive parallelism provided by general-purpose GPUs (GPGPUs) possessing numerous compute thread...
Massively parallel processing devices, like Graphics Processing Units (GPUs), have the ability to ac...