This paper presents COMPROF and COMPLACE, a novel profiling tool and thread placement technique for shared-memory architectures that requires no recompilation or user intervention. We use dynamic binary instrumentation to intercept memory operations and estimate inter-thread communication overhead, deriving (and possibly visualising) a communication graph of data-sharing between threads. We then use this graph to map threads to cores in order to optimise memory traffic through the memory system. Different paths through a system's memory hierarchy have different latency, throughput and energy properties, COMPLACE exploits this heterogeneity to provide automatic performance and energy improvements for multi-threaded programs. We demonstrate C...
Distributed shared-memory systems provide scalable performance and a convenient model for parallel p...
textChip multiprocessors (CMPs) commonly share a large portion of memory system resources among dif...
International audienceTo improve program performance on today's clusters, clouds and multicorecomput...
This paper presents COMPROF and COMPLACE, a novel profiling tool and thread placement technique for ...
Funding: This work was generously supported by UK EPSRC Energise, grant number EP/V006290/1.This pap...
The transition to multi-core architectures can be attributed mainly to fundamental limitations in cl...
International audienceThe parallelism in shared-memory systems has increased significantly with the ...
In a multicore environment, inter-thread communication can provide valuable insights about applicat...
Modern computers are based on manycore architectures, with multiple processors on a single silicon ...
The multicore era has initiated a move to ubiquitous parallelization of software. In the process, co...
Multithreading has emerged as a leading paradigm for the development of applications with demanding ...
International audienceCurrent and future architectures rely on thread-level parallelism to sustain p...
Scalable shared-memory multiprocessor systems are typically NUMA (nonuniform memory access) machines...
Across the landscape of computing, parallelism within applications is increasingly important in orde...
In this dissertation we present a methodology for predicting the best priority pair for a given co-s...
Distributed shared-memory systems provide scalable performance and a convenient model for parallel p...
textChip multiprocessors (CMPs) commonly share a large portion of memory system resources among dif...
International audienceTo improve program performance on today's clusters, clouds and multicorecomput...
This paper presents COMPROF and COMPLACE, a novel profiling tool and thread placement technique for ...
Funding: This work was generously supported by UK EPSRC Energise, grant number EP/V006290/1.This pap...
The transition to multi-core architectures can be attributed mainly to fundamental limitations in cl...
International audienceThe parallelism in shared-memory systems has increased significantly with the ...
In a multicore environment, inter-thread communication can provide valuable insights about applicat...
Modern computers are based on manycore architectures, with multiple processors on a single silicon ...
The multicore era has initiated a move to ubiquitous parallelization of software. In the process, co...
Multithreading has emerged as a leading paradigm for the development of applications with demanding ...
International audienceCurrent and future architectures rely on thread-level parallelism to sustain p...
Scalable shared-memory multiprocessor systems are typically NUMA (nonuniform memory access) machines...
Across the landscape of computing, parallelism within applications is increasingly important in orde...
In this dissertation we present a methodology for predicting the best priority pair for a given co-s...
Distributed shared-memory systems provide scalable performance and a convenient model for parallel p...
textChip multiprocessors (CMPs) commonly share a large portion of memory system resources among dif...
International audienceTo improve program performance on today's clusters, clouds and multicorecomput...