Performance of multithreaded applications is limited by a vari-ety of bottlenecks, e.g. critical sections, barriers and slow pipeline stages. These bottlenecks serialize execution, waste valuable exe-cution cycles, and limit scalability of applications. This paper pro-poses Bottleneck Identification and Scheduling (BIS), a cooperative software-hardware mechanism to identify and accelerate the most critical bottlenecks. BIS identifies which bottlenecks are likely to reduce performance by measuring the number of cycles threads have to wait for each bottleneck, and accelerates those bottle-necks using one or more fast cores on an Asymmetric Chip Multi-Processor (ACMP). Unlike previous work that targets specific bot-tlenecks, BIS can identify a...
Recent increases in hard fault rates in modern chip multi-processors have led to a variety of approa...
Applications may have unintended performance problems in spite of compiler optimizations, because of...
The multicore era has initiated a move to ubiquitous parallelization of software. In the process, co...
Abstract Performance of multithreaded applications is limited by a variety of bottlenecks, e.g. crit...
textWhen parallel applications do not fully utilize the cores that are available to them they are mi...
Analyzing multi-threaded programs is quite challenging, but is necessary to obtain good multicore pe...
textExtracting high-performance from Chip Multiprocessors (CMPs) requires that the application be pa...
Multi-threaded workloads typically show sublinear speedup on multi-core hardware, i.e., the achieved...
Large, high frequency single-core chip designs are increasingly being replaced with larger chip mult...
This paper evaluates new techniques to improve performance and efficiency of Chip MultiProcessors (C...
Emerging architecture designs include tens of processing cores on a single chip die; it is believed ...
Many important workloads today, such as web-hosted services, are limited not by processor core perfo...
Dynamically determining the appropriate number of threads for a multi-threaded application may lead ...
As the microprocessor industry embraces multicore architectures, inherently parallel applications be...
Simultaneous multithreading (SMT) allows multiple hardware threads to execute concurrently on a proc...
Recent increases in hard fault rates in modern chip multi-processors have led to a variety of approa...
Applications may have unintended performance problems in spite of compiler optimizations, because of...
The multicore era has initiated a move to ubiquitous parallelization of software. In the process, co...
Abstract Performance of multithreaded applications is limited by a variety of bottlenecks, e.g. crit...
textWhen parallel applications do not fully utilize the cores that are available to them they are mi...
Analyzing multi-threaded programs is quite challenging, but is necessary to obtain good multicore pe...
textExtracting high-performance from Chip Multiprocessors (CMPs) requires that the application be pa...
Multi-threaded workloads typically show sublinear speedup on multi-core hardware, i.e., the achieved...
Large, high frequency single-core chip designs are increasingly being replaced with larger chip mult...
This paper evaluates new techniques to improve performance and efficiency of Chip MultiProcessors (C...
Emerging architecture designs include tens of processing cores on a single chip die; it is believed ...
Many important workloads today, such as web-hosted services, are limited not by processor core perfo...
Dynamically determining the appropriate number of threads for a multi-threaded application may lead ...
As the microprocessor industry embraces multicore architectures, inherently parallel applications be...
Simultaneous multithreading (SMT) allows multiple hardware threads to execute concurrently on a proc...
Recent increases in hard fault rates in modern chip multi-processors have led to a variety of approa...
Applications may have unintended performance problems in spite of compiler optimizations, because of...
The multicore era has initiated a move to ubiquitous parallelization of software. In the process, co...