Abstract. Synchronization in parallel programs is a major performance bottleneck. Shared data is protected by locks and a lot of time is spent in the competition arising at the lock hand-off. In this period of time, a large amount of traffic is targeted to the line holding the lock variable. In order to be serialized, the requests to the same cache line can either be bounced (NACKed) or buffered in the coherence controller. In this paper we focus on systems whose coherence controllers buffer requests. During lock hand-off only the requests from the winning processor con-tribute to the computation progress, because the winning processor is the only one that will advance the work. This key observation leads us to propose a hardware mechanism ...
In multicores, performance-critical synchronization is increasingly performed in a lock-free manner ...
The thesis investigates non-blocking synchronization in shared memory systems, in particular in high...
The thesis investigates non-blocking synchronization in shared memory systems, in particular in high...
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-mem...
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-mem...
Efficient synchronization is important for achieving good performance in parallel programs, especial...
Large-scale shared-memory multiprocessors typically have long latencies for remote data accesses. A...
Scalable busy-wait synchronization algorithms are essential for achieving good parallel program perf...
Journal ArticleShared memory programs guarantee the correctness of concurrent accesses to shared dat...
Scalable busy-wait synchronization algorithms are essential for achieving good parallel program perf...
Conventional wisdom holds that contention due to busy-wait synchronization is a major obstacle to sc...
Efficient synchronization is important for achieving good performance in parallel programs, especial...
EjFcient synchronization primitives are essential for achieving high performance in he-grain, shared...
Barrier is widely used for synchronization in parallel programs. Since the process arrived earlier t...
The advent of chip multi-processors has led to an increase in computational performance in recent ye...
In multicores, performance-critical synchronization is increasingly performed in a lock-free manner ...
The thesis investigates non-blocking synchronization in shared memory systems, in particular in high...
The thesis investigates non-blocking synchronization in shared memory systems, in particular in high...
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-mem...
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-mem...
Efficient synchronization is important for achieving good performance in parallel programs, especial...
Large-scale shared-memory multiprocessors typically have long latencies for remote data accesses. A...
Scalable busy-wait synchronization algorithms are essential for achieving good parallel program perf...
Journal ArticleShared memory programs guarantee the correctness of concurrent accesses to shared dat...
Scalable busy-wait synchronization algorithms are essential for achieving good parallel program perf...
Conventional wisdom holds that contention due to busy-wait synchronization is a major obstacle to sc...
Efficient synchronization is important for achieving good performance in parallel programs, especial...
EjFcient synchronization primitives are essential for achieving high performance in he-grain, shared...
Barrier is widely used for synchronization in parallel programs. Since the process arrived earlier t...
The advent of chip multi-processors has led to an increase in computational performance in recent ye...
In multicores, performance-critical synchronization is increasingly performed in a lock-free manner ...
The thesis investigates non-blocking synchronization in shared memory systems, in particular in high...
The thesis investigates non-blocking synchronization in shared memory systems, in particular in high...