The task parallel programming model allows programmers to express concurrency at a high level of abstraction and places the burden of scheduling parallel execution on the run time system. Efficient scheduling of tasks on multi-socket multicore shared memory systems requires careful consideration of an increasingly complex memory hierarchy, including shared caches and non-uniform memory access (NUMA) characteristics. In this dissertation, we study the performance impact of these issues and other performance factors that limit parallel speedup in task parallel program executions and propose new scheduling strategies to improve performance. Our performance model characterizes lost efficiency in terms of overhead time, idle time, and work time ...
As we increase the number of cores on a processor die, the on-chip cache hierarchies that support th...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 1993. Simultaneously published...
As we increase the number of cores on a processor die, the on-chip cache hierarchies that support th...
Task parallelism raises the level of abstraction in shared memory parallel programming to simplify t...
Task parallelism raises the level of abstraction in shared memory parallel programming to simplify t...
Task parallelism raises the level of abstraction in shared memory parallel programming to simplify t...
Modern computer architectures expose an increasing number of parallel features supported by complex ...
Task parallelism raises the level of abstraction in shared memory parallel programming to simplify t...
Future multi- and many- core processors are likely to have tens of cores arranged in a tiled archite...
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce...
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce...
Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can...
International audienceWe present a joint scheduling and memory allocation algorithm for efficient ex...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
International audienceWe present a joint scheduling and memory allocation algorithm for efficient ex...
As we increase the number of cores on a processor die, the on-chip cache hierarchies that support th...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 1993. Simultaneously published...
As we increase the number of cores on a processor die, the on-chip cache hierarchies that support th...
Task parallelism raises the level of abstraction in shared memory parallel programming to simplify t...
Task parallelism raises the level of abstraction in shared memory parallel programming to simplify t...
Task parallelism raises the level of abstraction in shared memory parallel programming to simplify t...
Modern computer architectures expose an increasing number of parallel features supported by complex ...
Task parallelism raises the level of abstraction in shared memory parallel programming to simplify t...
Future multi- and many- core processors are likely to have tens of cores arranged in a tiled archite...
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce...
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce...
Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can...
International audienceWe present a joint scheduling and memory allocation algorithm for efficient ex...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
International audienceWe present a joint scheduling and memory allocation algorithm for efficient ex...
As we increase the number of cores on a processor die, the on-chip cache hierarchies that support th...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 1993. Simultaneously published...
As we increase the number of cores on a processor die, the on-chip cache hierarchies that support th...