In chip multiprocessors (CMPs), limiting the number of offchip cache misses is crucial for good performance. Many multithreaded programs provide opportunities for constructive cache sharing, in which concurrently scheduled threads share a largely overlapping working set. In this paper, we compare the performance of two state-of-the-art schedulers proposed for fine-grained multithreaded programs: Parallel Depth First (PDF), which is specifically designed for constructive cache sharing, and Work Stealing (WS), which is a more traditional design. Our experimental results indicate that PDF scheduling yields a 1.3 - 1.6X performance improvement relative to WS for several fine- grain parallel benchmarks on projected future CMP configurations; we ...
Microprocessor industry has converged on chip multiprocessor (CMP) as the architecture of choice to ...
CMPs allow threads to share portions of the on-chip cache. Critical to successful sharing are the p...
Current architectural trends of rising on-chip core counts and worsening power-performance penalties...
In chip multiprocessors (CMPs), limiting the number of off-chip cache misses is crucial for good per...
In chip multiprocessors (CMPs), limiting the number of off-chip cache misses is crucial for good per...
We present a new operating system scheduling algorithm for multicore processors. Our algorithm reduc...
Computational task DAGs are executed on parallel computers by a task scheduling algorithm. Intellige...
Cache utilisation is often very poor in multithreaded applications, due to the loss of data access l...
Most parallel programs exhibit more parallelism than is available in processors pro-duced today. Whi...
The evolution of microprocessor design in the last few decades has changed significantly, moving fro...
Chip-level multiprocessors (CMP) have multiple processing cores (Cores) and generally have their cac...
This paper describes a method to improve the cache locality of sequential programs by scheduling fin...
In recent years, the increasing design complexity and the problems of power and heat dissipation hav...
© 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for a...
One of the critical problems associated with emerging chip multiprocessors (CMPs) is the management ...
Microprocessor industry has converged on chip multiprocessor (CMP) as the architecture of choice to ...
CMPs allow threads to share portions of the on-chip cache. Critical to successful sharing are the p...
Current architectural trends of rising on-chip core counts and worsening power-performance penalties...
In chip multiprocessors (CMPs), limiting the number of off-chip cache misses is crucial for good per...
In chip multiprocessors (CMPs), limiting the number of off-chip cache misses is crucial for good per...
We present a new operating system scheduling algorithm for multicore processors. Our algorithm reduc...
Computational task DAGs are executed on parallel computers by a task scheduling algorithm. Intellige...
Cache utilisation is often very poor in multithreaded applications, due to the loss of data access l...
Most parallel programs exhibit more parallelism than is available in processors pro-duced today. Whi...
The evolution of microprocessor design in the last few decades has changed significantly, moving fro...
Chip-level multiprocessors (CMP) have multiple processing cores (Cores) and generally have their cac...
This paper describes a method to improve the cache locality of sequential programs by scheduling fin...
In recent years, the increasing design complexity and the problems of power and heat dissipation hav...
© 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for a...
One of the critical problems associated with emerging chip multiprocessors (CMPs) is the management ...
Microprocessor industry has converged on chip multiprocessor (CMP) as the architecture of choice to ...
CMPs allow threads to share portions of the on-chip cache. Critical to successful sharing are the p...
Current architectural trends of rising on-chip core counts and worsening power-performance penalties...