As we prepare for the extreme-scale era of computing, communication overhead and synchronization between cores will soon become extremely important. In this work we study three different methods of support for fine-grain synchronization. Fine-grain synchronization allows a task to be broken up into very small units, improving load balancing and reducing lock contention. The different methods include hardware support for full/empty bits, compare-and-swap (CAS) emulation of full/empty bits, and dual CAS operations. Roger Golliver’s single CAS implementation is a novel method which chooses a bit pattern to represent an “empty” full/empty bit state. The primary concerns are hardware overhead, efficiency of the synchronization, and energy wasted whil...
This paper proposes a set of efficient primitives for process synchronization in multiprocessors. T...
AbstreetThis paper proposes a set of efficient primitives for process synchronization in muitiproces...
Efficient synchronization can dramatically improve the performance of shared-memory parallel program...
We introduce Transient Blocking Synchronization (TBS), a new approach to hardware synchronization fo...
As the multiprocessors scale beyond the limits of a few tens of processors, we must look beyond the ...
This paper addresses the problem of universal synchronizationprimitives that can support scalable th...
his paper addresses the problem of universal synchronization primitives that can support scalable th...
We introduce a non-blocking full/empty bit primitive, or NB-FEB for short, as a promising synchroniz...
The Cray XMT architecture has incited curiosity among computer architects and system software design...
Multi-core chip architectures are becoming mainstream, permitting increasing on-chip paral-lelism th...
It has been already verified that hardware-supported fine-grain synchronization provides a significa...
The quest to improve performance forces designers to explore finer-grained multiprocessor machines. ...
EjFcient synchronization primitives are essential for achieving high performance in he-grain, shared...
Shared memory multiprocessor systems typically provide a set of hardware primitives in order to supp...
Efficientsynchronization is an essential component of parallel computing. The designers of traditio...
This paper proposes a set of efficient primitives for process synchronization in multiprocessors. T...
AbstreetThis paper proposes a set of efficient primitives for process synchronization in muitiproces...
Efficient synchronization can dramatically improve the performance of shared-memory parallel program...
We introduce Transient Blocking Synchronization (TBS), a new approach to hardware synchronization fo...
As the multiprocessors scale beyond the limits of a few tens of processors, we must look beyond the ...
This paper addresses the problem of universal synchronizationprimitives that can support scalable th...
his paper addresses the problem of universal synchronization primitives that can support scalable th...
We introduce a non-blocking full/empty bit primitive, or NB-FEB for short, as a promising synchroniz...
The Cray XMT architecture has incited curiosity among computer architects and system software design...
Multi-core chip architectures are becoming mainstream, permitting increasing on-chip paral-lelism th...
It has been already verified that hardware-supported fine-grain synchronization provides a significa...
The quest to improve performance forces designers to explore finer-grained multiprocessor machines. ...
EjFcient synchronization primitives are essential for achieving high performance in he-grain, shared...
Shared memory multiprocessor systems typically provide a set of hardware primitives in order to supp...
Efficientsynchronization is an essential component of parallel computing. The designers of traditio...
This paper proposes a set of efficient primitives for process synchronization in multiprocessors. T...
AbstreetThis paper proposes a set of efficient primitives for process synchronization in muitiproces...
Efficient synchronization can dramatically improve the performance of shared-memory parallel program...