This paper presents a novel mechanism for barrier synchronization on chip multi-processors (CMPs). By forcing the invalidation of selected I-cache lines, this mechanism starves threads and thus forces their execution to stop. Threads are let free when all have entered the barrier. We evaluated this mechanism using SMTSim and report much better (and most importantly, more flat) performance than lock-based barriers supported by existing microprocessors. 1
Abstract—Barrier synchronization is a key programming primitive for shared memory embedded MPSoCs. A...
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-mem...
Large-scale shared-memory multiprocessors typically have long latencies for remote data accesses. A...
We present in this work a novel hardware-based barrier mech-anism for synchronization on many-core C...
Abstract. Whereas efcient barrier implementations were once a concern only in high-performance compu...
This paper investigates optimized synchronization techniques for shared memory on-chip multiprocesso...
Barrier is widely used for synchronization in parallel programs. Since the process arrived earlier t...
This paper presents a new methodology for implementing fast synchronization on scalable cache-cohere...
Existing multiprocessor synchronization mechanisms are relatively heavyweight, due in part to the le...
Barrier synchronization in shared memory parallel ma-chines has been widely implemented through busy...
Scalable busy-wait synchronization algorithms are essential for achieving good parallel program perf...
Abstract This paper proposes and evaluates new synchronization schemes for a simultaneous multithrea...
Abstract. Synchronization in parallel programs is a major performance bottleneck. Shared data is pro...
Scalable busy-wait synchronization algorithms are essential for achieving good parallel program perf...
Barrier synchronisation is a widely-studied topic since the supercomputer era due to its significant...
Abstract—Barrier synchronization is a key programming primitive for shared memory embedded MPSoCs. A...
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-mem...
Large-scale shared-memory multiprocessors typically have long latencies for remote data accesses. A...
We present in this work a novel hardware-based barrier mech-anism for synchronization on many-core C...
Abstract. Whereas efcient barrier implementations were once a concern only in high-performance compu...
This paper investigates optimized synchronization techniques for shared memory on-chip multiprocesso...
Barrier is widely used for synchronization in parallel programs. Since the process arrived earlier t...
This paper presents a new methodology for implementing fast synchronization on scalable cache-cohere...
Existing multiprocessor synchronization mechanisms are relatively heavyweight, due in part to the le...
Barrier synchronization in shared memory parallel ma-chines has been widely implemented through busy...
Scalable busy-wait synchronization algorithms are essential for achieving good parallel program perf...
Abstract This paper proposes and evaluates new synchronization schemes for a simultaneous multithrea...
Abstract. Synchronization in parallel programs is a major performance bottleneck. Shared data is pro...
Scalable busy-wait synchronization algorithms are essential for achieving good parallel program perf...
Barrier synchronisation is a widely-studied topic since the supercomputer era due to its significant...
Abstract—Barrier synchronization is a key programming primitive for shared memory embedded MPSoCs. A...
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-mem...
Large-scale shared-memory multiprocessors typically have long latencies for remote data accesses. A...