Efficient Synchronization Primitives for GPUs

Stuart, Jeff A
Owens, John D

Publication date

October 2011

Publisher

eScholarship, University of California

Abstract

In this paper, we revisit the design of synchronization primitives---specifically barriers, mutexes, and semaphores---and how they apply to the GPU. Previous implementations are insufficient due to the discrepancies in hardware and programming model of the GPU and CPU. We create new implementations in CUDA and analyze the performance of spinning on the GPU, as well as a method of sleeping on the GPU, by running a set of memory-system benchmarks on two of the most common GPUs in use, the Tesla- and Fermi-class GPUs from NVIDIA. From our results we define higher-level principles that are valid for generic many-core processors, the most important of which is to limit the number of atomic accesses required for a synchronization operation becaus...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Efficient Synchronization Primitives for GPUs

Abstract

Extracted data

Efficient Synchronization Primitives for GPUs

Abstract

Extracted data

Related items

Related items