We present a work-stealing algorithm for total-store memory architectures, such as Intel's X86, that does not rely on atomic read-modify-write instructions such as compare-and-swap. In our algorithm, processors communicate solely by reading from and writing (non-atomically) into weakly consistent memory. We also show that join resolution, an important problem in scheduling parallel programs, can also be solved without using atomic read-modify-write instructions. At a high level, our work-stealing algorithm closely resembles traditional work-stealing algorithms, but certain details are more complex. Instead of relying on atomic read-modify-write operations, our algorithm uses a steal protocol that enables processors to perform load balancing...
Work-stealing is a promising approach for effectively exploiting software parallelism on parallel ha...
Various memory consistency model implementations (e.g., x86, SPARC) willfully allow a core to see it...
Abstract—Updating a shared data structure in a parallel program is usually done with some sort of hi...
We present a work-stealing algorithm for total-store memory architectures, such as Intel's X86, that...
Updating a shared data structure in a parallel program is usually done with some sort of high-level ...
This paper studies the data locality of the work-stealing scheduling algorithm on hardware-controlle...
Load balancing is a technique which allows efficient parallelization of irregular workloads, and a k...
Many hardware primitives have been proposed for synchronization and atomic memory update on shared-m...
Modern multiprocessor systems offer advanced synchronization primitives, built in hardware, to suppo...
The fork-join paradigm of concurrent expression has gained popularity in conjunction with work-steal...
Modern multiprocessor systems offer advanced synchronization primitives, built in hardware, to suppo...
Abstract. This paper considers quorum-replicated, multi-writer, multi-reader (MWMR) implementations ...
Many hardware primitives have been proposed for synchronization and atomic mem-ory update on shared-...
This paper addresses the problem of efficiently supporting parallelism within a managed runtime. A p...
Abstract. Modern multiprocessor systems offer advanced synchronization primitives, built in hardware...
Work-stealing is a promising approach for effectively exploiting software parallelism on parallel ha...
Various memory consistency model implementations (e.g., x86, SPARC) willfully allow a core to see it...
Abstract—Updating a shared data structure in a parallel program is usually done with some sort of hi...
We present a work-stealing algorithm for total-store memory architectures, such as Intel's X86, that...
Updating a shared data structure in a parallel program is usually done with some sort of high-level ...
This paper studies the data locality of the work-stealing scheduling algorithm on hardware-controlle...
Load balancing is a technique which allows efficient parallelization of irregular workloads, and a k...
Many hardware primitives have been proposed for synchronization and atomic memory update on shared-m...
Modern multiprocessor systems offer advanced synchronization primitives, built in hardware, to suppo...
The fork-join paradigm of concurrent expression has gained popularity in conjunction with work-steal...
Modern multiprocessor systems offer advanced synchronization primitives, built in hardware, to suppo...
Abstract. This paper considers quorum-replicated, multi-writer, multi-reader (MWMR) implementations ...
Many hardware primitives have been proposed for synchronization and atomic mem-ory update on shared-...
This paper addresses the problem of efficiently supporting parallelism within a managed runtime. A p...
Abstract. Modern multiprocessor systems offer advanced synchronization primitives, built in hardware...
Work-stealing is a promising approach for effectively exploiting software parallelism on parallel ha...
Various memory consistency model implementations (e.g., x86, SPARC) willfully allow a core to see it...
Abstract—Updating a shared data structure in a parallel program is usually done with some sort of hi...