The ever-increasing gap between the processor and main memory speeds requires careful utilization of the limited memory link. This is additionally emphasized for the case of memory-bound applications. Prioritization of memory requests in the memory controller is one of the approaches to improve performance of such codes. However, current designs do not consider high-level information about parallel applications. In this paper, we propose a holistic approach to this problem, where the runtime system-level knowledge is made available in hardware. Processor exploits this information to better prioritize memory requests, while introducing negligible hardware cost. Our design is based on the notion of critical path in the execution of a parallel...
In chip multiprocessor (CMP) systems with multi-application workloads, communication and memory acce...
This paper proposes and evaluates prioritized direct shared-memory multiprocessor networks. We use t...
Efficient performance tuning of parallel programs is often hard. Optimization is often done when the...
2013 Fall.Includes bibliographical references.In chip multi-processor (CMP) systems, communication a...
Research on computer memory systems has been of increasing importance over the last decade, as they ...
The full potential of chip multiprocessors remains unex- ploited due to the thread oblivious memory ...
. It is argued that scheduling is an important determinant of performance for many parallel symbolic...
Although some instructions hurt performance more than others, current processors typically apply sch...
Although some instructions hurt performance more than others, current processors typically apply sch...
The evolution of computers is moving more and more towards multi-core processors and parallel progra...
Modern processors remove many artificial constraints on instruction ordering,permitting multiple ins...
Due to the limitations of instruction-level parallelism, thread-level parallelism has become a popul...
Managing criticality in task-based programming models opens a wide range of performance and power op...
Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the...
International audienceRecent technological advances have led to an increasing gap between memory and...
In chip multiprocessor (CMP) systems with multi-application workloads, communication and memory acce...
This paper proposes and evaluates prioritized direct shared-memory multiprocessor networks. We use t...
Efficient performance tuning of parallel programs is often hard. Optimization is often done when the...
2013 Fall.Includes bibliographical references.In chip multi-processor (CMP) systems, communication a...
Research on computer memory systems has been of increasing importance over the last decade, as they ...
The full potential of chip multiprocessors remains unex- ploited due to the thread oblivious memory ...
. It is argued that scheduling is an important determinant of performance for many parallel symbolic...
Although some instructions hurt performance more than others, current processors typically apply sch...
Although some instructions hurt performance more than others, current processors typically apply sch...
The evolution of computers is moving more and more towards multi-core processors and parallel progra...
Modern processors remove many artificial constraints on instruction ordering,permitting multiple ins...
Due to the limitations of instruction-level parallelism, thread-level parallelism has become a popul...
Managing criticality in task-based programming models opens a wide range of performance and power op...
Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the...
International audienceRecent technological advances have led to an increasing gap between memory and...
In chip multiprocessor (CMP) systems with multi-application workloads, communication and memory acce...
This paper proposes and evaluates prioritized direct shared-memory multiprocessor networks. We use t...
Efficient performance tuning of parallel programs is often hard. Optimization is often done when the...