Improving program performance through the use of multiple homogeneous processing elements, or cores, is common-place. However, these architectures increase the complexity required at the software level. Existing work is focused on optimising programs that run in isolation on these systems, but ignores the fact that, in reality, these systems run multiple parallel programs concurrently with programs competing for system resources. In order to improve performance in this shared environment, cooperative tuning of multiple, concurrently running parallel programs is required. Moreover, the set of programs running on the system – the system workload – is dynamic and rapidly changing. This makes cooperative tuning a challenge, as it must ...
The performance of a computer system is important. One way of improving performance is to use multip...
Funding: This work has been supported by the European Union Framework 7 grant IST-2011-288570 “ParaP...
This paper presents a new technique for introducing and tuning parallelism for heterogeneous shared-...
Algorithmic skeletons abstract commonly-used patterns of parallel computation, communication, and in...
Emerging architecture designs include tens of processing cores on a single chip die; it is believed ...
peer-reviewedThe shift towards multicore processing has led to a much wider population of developer...
The tuning of parallel programs on large distributed-memory machines today is usually a costly, and ...
This paper describes a new parallel program tuning framework, with a new approach for tuning. The ap...
Auto-tuning has recently received significant attention from the High Performance Computing communi...
While parallel computing offers an attractive perspective for the future, developing efficient paral...
The availability of modern commodity multicore processors and multiprocessor computer systems has re...
The recent shift toward multi-core chips has pushed the burden of extracting performance to the prog...
Multicore clusters provide cost-effective platforms for running CPU-intensive and data-intensive para...
peer-reviewedIn recent years there has been a shift in microprocessor manufacture from building sing...
Abstract—The emergence of multi-core systems opens new opportunities for thread-level parallelism an...
The performance of a computer system is important. One way of improving performance is to use multip...
Funding: This work has been supported by the European Union Framework 7 grant IST-2011-288570 “ParaP...
This paper presents a new technique for introducing and tuning parallelism for heterogeneous shared-...
Algorithmic skeletons abstract commonly-used patterns of parallel computation, communication, and in...
Emerging architecture designs include tens of processing cores on a single chip die; it is believed ...
peer-reviewedThe shift towards multicore processing has led to a much wider population of developer...
The tuning of parallel programs on large distributed-memory machines today is usually a costly, and ...
This paper describes a new parallel program tuning framework, with a new approach for tuning. The ap...
Auto-tuning has recently received significant attention from the High Performance Computing communi...
While parallel computing offers an attractive perspective for the future, developing efficient paral...
The availability of modern commodity multicore processors and multiprocessor computer systems has re...
The recent shift toward multi-core chips has pushed the burden of extracting performance to the prog...
Multicore clusters provide cost-effective platforms for running CPU-intensive and data-intensive para...
peer-reviewedIn recent years there has been a shift in microprocessor manufacture from building sing...
Abstract—The emergence of multi-core systems opens new opportunities for thread-level parallelism an...
The performance of a computer system is important. One way of improving performance is to use multip...
Funding: This work has been supported by the European Union Framework 7 grant IST-2011-288570 “ParaP...
This paper presents a new technique for introducing and tuning parallelism for heterogeneous shared-...