The use of efficient synchronization mechanisms is crucial for implementing fine grained parallel programs on modern shared cache multi-core architectures. In this paper we study this problem by considering Single-Producer/Single- Consumer (SPSC) coordination using unbounded queues. A novel unbounded SPSC algorithm capable of reducing the row synchronization latency and speeding up Producer-Consumer coordination is presented. The algorithm has been extensively tested on a shared-cache multi-core platform and a sketch proof of correctness is presented. The queues proposed have been used as basic building blocks to implement the FastFlow parallel framework, which has been demonstrated to offer very good performance for fine-grain parallel app...
AbstreetThis paper proposes a set of efficient primitives for process synchronization in muitiproces...
Reordering instructions and data layout can bring significant performance improvement for memory bou...
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-mem...
Using efficient point-to-point communication channels is critical for implementing fine grained para...
The thesis investigates non-blocking synchronization in shared memory systems, in particular in high...
Data processing pipelines normally use lockless Single-Producer–Single-Consumer (SPSC) queues to eff...
The thesis investigates non-blocking synchronization in shared memory systems, in particular in high...
Core-to-core communication is critical to the effective use of multi-core processors. A number of so...
This paper proposes a set of efficient primitives for process synchronization in multiprocessors. T...
In multicores, performance-critical synchronization is increasingly performed in a lock-free manner ...
Large-scale shared-memory multiprocessors typically have long latencies for remote data accesses. A...
The rapid progress of multi-/many-core architectures has caused data-intensive parallel applications...
We present a fast and scalable lock algorithm for shared-memory multiprocessors addressing the resou...
A lock-free FIFO queue data structure is presented in this paper. The algorithm supports multiple pr...
Abstract. In this work, we study the scalability, performance, design and implementation of basic da...
AbstreetThis paper proposes a set of efficient primitives for process synchronization in muitiproces...
Reordering instructions and data layout can bring significant performance improvement for memory bou...
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-mem...
Using efficient point-to-point communication channels is critical for implementing fine grained para...
The thesis investigates non-blocking synchronization in shared memory systems, in particular in high...
Data processing pipelines normally use lockless Single-Producer–Single-Consumer (SPSC) queues to eff...
The thesis investigates non-blocking synchronization in shared memory systems, in particular in high...
Core-to-core communication is critical to the effective use of multi-core processors. A number of so...
This paper proposes a set of efficient primitives for process synchronization in multiprocessors. T...
In multicores, performance-critical synchronization is increasingly performed in a lock-free manner ...
Large-scale shared-memory multiprocessors typically have long latencies for remote data accesses. A...
The rapid progress of multi-/many-core architectures has caused data-intensive parallel applications...
We present a fast and scalable lock algorithm for shared-memory multiprocessors addressing the resou...
A lock-free FIFO queue data structure is presented in this paper. The algorithm supports multiple pr...
Abstract. In this work, we study the scalability, performance, design and implementation of basic da...
AbstreetThis paper proposes a set of efficient primitives for process synchronization in muitiproces...
Reordering instructions and data layout can bring significant performance improvement for memory bou...
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-mem...