this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of ARPA or the U.S. Government. The running time and memory requirement of a parallel program with dynamic, lightweight threads depends heavily on the underlying thread scheduler. In this paper, we present a simple, asynchronous, space-efficient scheduling algorithm for shared memory machines that combines the low scheduling overheads and good locality of work stealing with the low space requirements of depth-first schedulers. For a nested-parallel program with depth D and serial space requirement S1, we show that the expected space requirement is S1 + O(K p D) on p processors. Here, K is a user-adjustable ...
We present a user-level thread scheduler for shared-memory multiprocessors, and we analyze its perfo...
Multithreading has become a dominant paradigm in general purpose MIMD parallel computation. To execu...
This paper describes a method to improve the cache locality of sequential programs by scheduling fin...
Abstract The running time and memory requirement of a parallel pro-gram with dynamic, lightweight th...
The running time and memory requirement of a parallel program with dynamic, lightweight threads depe...
Abstract The goal of high-level parallel programming models or languages is to facilitate the writin...
Many of today's high level parallel languages support dynamic, fine-grained parallelism. These ...
Lightweight threads have become a common abstraction in the field of programming languages and opera...
In a modern chip-multiprocessor system, memory is a shared resource among multiple concurrently exec...
Most parallel programs exhibit more parallelism than is available in processors pro-duced today. Whi...
We present a user-level thread scheduler for shared-memory multiprocessors, and we analyze its perfo...
In a modern chip-multiprocessor system, memory is a shared resource among multiple concurrently exec...
In this paper, we present a randomized, online, space-efficient algorithm for the general class of p...
In this paper, we present a randomized, online, space-efficient algorithm for the general class of p...
In this paper we propose new insights into the problem of concurrently scheduling threads through ma...
We present a user-level thread scheduler for shared-memory multiprocessors, and we analyze its perfo...
Multithreading has become a dominant paradigm in general purpose MIMD parallel computation. To execu...
This paper describes a method to improve the cache locality of sequential programs by scheduling fin...
Abstract The running time and memory requirement of a parallel pro-gram with dynamic, lightweight th...
The running time and memory requirement of a parallel program with dynamic, lightweight threads depe...
Abstract The goal of high-level parallel programming models or languages is to facilitate the writin...
Many of today's high level parallel languages support dynamic, fine-grained parallelism. These ...
Lightweight threads have become a common abstraction in the field of programming languages and opera...
In a modern chip-multiprocessor system, memory is a shared resource among multiple concurrently exec...
Most parallel programs exhibit more parallelism than is available in processors pro-duced today. Whi...
We present a user-level thread scheduler for shared-memory multiprocessors, and we analyze its perfo...
In a modern chip-multiprocessor system, memory is a shared resource among multiple concurrently exec...
In this paper, we present a randomized, online, space-efficient algorithm for the general class of p...
In this paper, we present a randomized, online, space-efficient algorithm for the general class of p...
In this paper we propose new insights into the problem of concurrently scheduling threads through ma...
We present a user-level thread scheduler for shared-memory multiprocessors, and we analyze its perfo...
Multithreading has become a dominant paradigm in general purpose MIMD parallel computation. To execu...
This paper describes a method to improve the cache locality of sequential programs by scheduling fin...