This paper presents a new methodology for implementing fast synchronization on scalable cache-coherent multiprocessors, through the use of hybrid primitives. Hybrid primitives leverage commodity hardware to speed-up the execution of the atomic remote Read-Modify-Write (RMW) instructions employed in synchronization algorithms to resolve contending processors, while exploiting the caches to reduce network traffic during the waiting and release phases of a synchronization primitive. We present a systematic methodology for transforming any synchronization primitive that uses RMW instructions into a hybrid one. We then provide experimental evidence on the effectiveness of using hybrid primitives in the implementation of spin locks, barriers and ...
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-mem...
Synchronization primitives for large scale multiprocessors need to provide low latency and low conte...
As the level of parallelism in manycore processors keeps increasing, providing efficient mechanisms ...
This paper proposes a set of efficient primitives for process synchronization in multiprocessors. T...
AbstreetThis paper proposes a set of efficient primitives for process synchronization in muitiproces...
EjFcient synchronization primitives are essential for achieving high performance in he-grain, shared...
This paper investigates optimized synchronization techniques for shared memory on-chip multiprocesso...
Large-scale shared-memory multiprocessors typically have long latencies for remote data accesses. A ...
This paper presents a novel mechanism for barrier synchronization on chip multi-processors (CMPs). B...
In multicores, performance-critical synchronization is increasingly performed in a lock-free manner ...
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-mem...
The quest to improve performance forces designers to explore finer-grained multiprocessor machines. ...
Synchronization is a crucial operation in many parallel applications. Conventional synchronization m...
It has been already verified that hardware-supported fine-grain synchronization provides a significa...
Shared memory multiprocessor systems typically provide a set of hardware primitives in order to supp...
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-mem...
Synchronization primitives for large scale multiprocessors need to provide low latency and low conte...
As the level of parallelism in manycore processors keeps increasing, providing efficient mechanisms ...
This paper proposes a set of efficient primitives for process synchronization in multiprocessors. T...
AbstreetThis paper proposes a set of efficient primitives for process synchronization in muitiproces...
EjFcient synchronization primitives are essential for achieving high performance in he-grain, shared...
This paper investigates optimized synchronization techniques for shared memory on-chip multiprocesso...
Large-scale shared-memory multiprocessors typically have long latencies for remote data accesses. A ...
This paper presents a novel mechanism for barrier synchronization on chip multi-processors (CMPs). B...
In multicores, performance-critical synchronization is increasingly performed in a lock-free manner ...
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-mem...
The quest to improve performance forces designers to explore finer-grained multiprocessor machines. ...
Synchronization is a crucial operation in many parallel applications. Conventional synchronization m...
It has been already verified that hardware-supported fine-grain synchronization provides a significa...
Shared memory multiprocessor systems typically provide a set of hardware primitives in order to supp...
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-mem...
Synchronization primitives for large scale multiprocessors need to provide low latency and low conte...
As the level of parallelism in manycore processors keeps increasing, providing efficient mechanisms ...