Funding: This work was generously supported by UK EPSRC Energise, grant number EP/V006290/1.This paper presents COMPROF and COMPLACE, a novel profiling tool and thread placement technique for shared-memory architectures that requires no recompilation or user intervention. We use dynamic binary instrumentation to intercept memory operations and estimate inter-thread communication overhead, deriving (and possibly visualising) a communication graph of data-sharing between threads. We then use this graph to map threads to cores in order to optimise memory traffic through the memory system. Different paths through a system's memory hierarchy have different latency, throughput and energy properties, COMPLACE exploits this heterogeneity to provide...
International audienceEfficiently programming shared-memory machines is a difficult challenge becaus...
The performance and energy efficiency of modern architectures depend on memory locality, which can b...
Modern computers are based on manycore architectures, with multiple processors on a single silicon ...
This paper presents COMPROF and COMPLACE, a novel profiling tool and thread placement technique for ...
International audienceThe parallelism in shared-memory systems has increased significantly with the ...
In a multicore environment, inter-thread communication can provide valuable insights about applicat...
The transition to multi-core architectures can be attributed mainly to fundamental limitations in cl...
Scalable shared-memory multiprocessor systems are typically NUMA (nonuniform memory access) machines...
Multicore processors have become ubiquitous in today's computing platforms, extending from smartphon...
International audienceCurrent and future architectures rely on thread-level parallelism to sustain p...
International audienceNon-blocking collectives have been proposed so as to allow communications to b...
The era of multi-core processors has begun. These multi- core processors represent a significant shi...
The multicore era has initiated a move to ubiquitous parallelization of software. In the process, co...
Multicomputer (distributed memory MIMD machines) have emerged as inexpensive, yet powerful parallel...
Distributed shared-memory systems provide scalable performance and a convenient model for parallel p...
International audienceEfficiently programming shared-memory machines is a difficult challenge becaus...
The performance and energy efficiency of modern architectures depend on memory locality, which can b...
Modern computers are based on manycore architectures, with multiple processors on a single silicon ...
This paper presents COMPROF and COMPLACE, a novel profiling tool and thread placement technique for ...
International audienceThe parallelism in shared-memory systems has increased significantly with the ...
In a multicore environment, inter-thread communication can provide valuable insights about applicat...
The transition to multi-core architectures can be attributed mainly to fundamental limitations in cl...
Scalable shared-memory multiprocessor systems are typically NUMA (nonuniform memory access) machines...
Multicore processors have become ubiquitous in today's computing platforms, extending from smartphon...
International audienceCurrent and future architectures rely on thread-level parallelism to sustain p...
International audienceNon-blocking collectives have been proposed so as to allow communications to b...
The era of multi-core processors has begun. These multi- core processors represent a significant shi...
The multicore era has initiated a move to ubiquitous parallelization of software. In the process, co...
Multicomputer (distributed memory MIMD machines) have emerged as inexpensive, yet powerful parallel...
Distributed shared-memory systems provide scalable performance and a convenient model for parallel p...
International audienceEfficiently programming shared-memory machines is a difficult challenge becaus...
The performance and energy efficiency of modern architectures depend on memory locality, which can b...
Modern computers are based on manycore architectures, with multiple processors on a single silicon ...