Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become markedly more pronounced as applications scale. We argue that this problem is not fundamental, and that one can in fact construct busy-wait synchronization algorithms that induce no memory or interconnect contention. The key to these algorithms is for every processor to spin on separate locally-accessible flag variables, and for some other processor to terminate the spin with a single remote write operation at an appropriate time. Flag variables may b...
Queue-based spin locks allow programs with busy-wait syn-chronization to scale to very large multipr...
Synchronization primitives for large scale multiprocessors need to provide low latency and low conte...
EjFcient synchronization primitives are essential for achieving high performance in he-grain, shared...
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-mem...
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-mem...
Conventional wisdom holds that contention due to busy-wait synchronization is a major obstacle to sc...
Scalable busy-wait synchronization algorithms are essential for achieving good parallel program perf...
Scalable busy-wait synchronization algorithms are essential for achieving good parallel program perf...
Efficient synchronization is important for achieving good performance in parallel programs, especial...
Efficient synchronization is important for achieving good performance in parallel programs, especial...
Large-scale shared-memory multiprocessors typically have long latencies for remote data accesses. A ...
It is our thesis that scalable synchronization can be achieved with only minimal hardware support, s...
Abstract. Synchronization in parallel programs is a major performance bottleneck. Shared data is pro...
Link to published version: http://portal.acm.org/ft_gateway.cfm?id=379566&type=pdf&coll=portal&dl=AC...
Predictable interprocessor synchronization and fast interrupt response are important for real-time s...
Queue-based spin locks allow programs with busy-wait syn-chronization to scale to very large multipr...
Synchronization primitives for large scale multiprocessors need to provide low latency and low conte...
EjFcient synchronization primitives are essential for achieving high performance in he-grain, shared...
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-mem...
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-mem...
Conventional wisdom holds that contention due to busy-wait synchronization is a major obstacle to sc...
Scalable busy-wait synchronization algorithms are essential for achieving good parallel program perf...
Scalable busy-wait synchronization algorithms are essential for achieving good parallel program perf...
Efficient synchronization is important for achieving good performance in parallel programs, especial...
Efficient synchronization is important for achieving good performance in parallel programs, especial...
Large-scale shared-memory multiprocessors typically have long latencies for remote data accesses. A ...
It is our thesis that scalable synchronization can be achieved with only minimal hardware support, s...
Abstract. Synchronization in parallel programs is a major performance bottleneck. Shared data is pro...
Link to published version: http://portal.acm.org/ft_gateway.cfm?id=379566&type=pdf&coll=portal&dl=AC...
Predictable interprocessor synchronization and fast interrupt response are important for real-time s...
Queue-based spin locks allow programs with busy-wait syn-chronization to scale to very large multipr...
Synchronization primitives for large scale multiprocessors need to provide low latency and low conte...
EjFcient synchronization primitives are essential for achieving high performance in he-grain, shared...