Abstract Performance of multithreaded applications is limited by a variety of bottlenecks, e.g. critical sections, barriers and slow pipeline stages. These bottlenecks serialize execution, waste valuable execution cycles, and limit scalability of applications. This paper proposes Bottleneck Identification and Scheduling (BIS), a cooperative software-hardware mechanism to identify and accelerate the most critical bottlenecks. BIS identifies which bottlenecks are likely to reduce performance by measuring the number of cycles threads have to wait for each bottleneck, and accelerates those bottlenecks using one or more fast cores on an Asymmetric Chip MultiProcessor (ACMP). Unlike previous work that targets specific bottlenecks, BIS can identif...
The use of parallelism enhances the performance of a software system. However, its excessive use can...
Through the past several decades, based on the Moore's law, the semiconductor industry was doubling ...
Exploitation of parallelism has for decades been central to the pursuit of computing performance. Th...
Performance of multithreaded applications is limited by a vari-ety of bottlenecks, e.g. critical sec...
textWhen parallel applications do not fully utilize the cores that are available to them they are mi...
textExtracting high-performance from Chip Multiprocessors (CMPs) requires that the application be pa...
Multi-threaded workloads typically show sublinear speedup on multi-core hardware, i.e., the achieved...
Most modern personal computers come with processors which contain multiple cores. Often, one or more...
Ensuring the continuous scaling of parallel applications is challenging on many-core processors, due...
This paper evaluates new techniques to improve performance and efficiency of Chip MultiProcessors (C...
As hardware becomes increasingly parallel and the availability of scalable parallel software improve...
Large, high frequency single-core chip designs are increasingly being replaced with larger chip mult...
Multithreading (MT), by simultaneously using both the thread-level parallelism and the instruction-l...
As the microprocessor industry embraces multicore architectures, inherently parallel applications be...
Analyzing multi-threaded programs is quite challenging, but is necessary to obtain good multicore pe...
The use of parallelism enhances the performance of a software system. However, its excessive use can...
Through the past several decades, based on the Moore's law, the semiconductor industry was doubling ...
Exploitation of parallelism has for decades been central to the pursuit of computing performance. Th...
Performance of multithreaded applications is limited by a vari-ety of bottlenecks, e.g. critical sec...
textWhen parallel applications do not fully utilize the cores that are available to them they are mi...
textExtracting high-performance from Chip Multiprocessors (CMPs) requires that the application be pa...
Multi-threaded workloads typically show sublinear speedup on multi-core hardware, i.e., the achieved...
Most modern personal computers come with processors which contain multiple cores. Often, one or more...
Ensuring the continuous scaling of parallel applications is challenging on many-core processors, due...
This paper evaluates new techniques to improve performance and efficiency of Chip MultiProcessors (C...
As hardware becomes increasingly parallel and the availability of scalable parallel software improve...
Large, high frequency single-core chip designs are increasingly being replaced with larger chip mult...
Multithreading (MT), by simultaneously using both the thread-level parallelism and the instruction-l...
As the microprocessor industry embraces multicore architectures, inherently parallel applications be...
Analyzing multi-threaded programs is quite challenging, but is necessary to obtain good multicore pe...
The use of parallelism enhances the performance of a software system. However, its excessive use can...
Through the past several decades, based on the Moore's law, the semiconductor industry was doubling ...
Exploitation of parallelism has for decades been central to the pursuit of computing performance. Th...