International audienceParallelizing industrial simulation codes like the EUROPLEXUS software dedicated to the analysis of fast transient phenomena, is challenging. In this paper we focus on the efficient parallelization on a multi-core shared memory node. We propose to have each thread gather the data it needs for processing a given iteration range, before to actually advance the computation by one time step on this range. This lazy cache aware layout construction enables to keep the original data structure and leads to very localised code modifications. We show that this approach can improve the execution time by up to 40% when the task size is set to have the data fit in the L2 cache
Abstract—The emergence of multi-core systems opens new opportunities for thread-level parallelism an...
This paper introduces a dynamic layout optimization strategy to minimize the number of cycles spent ...
Memory subsystem, in particular, cache design is important for both high performance and embedded co...
International audienceParallelizing industrial simulation codes like the EUROPLEXUS software dedicat...
AbstractParallelizing industrial simulation codes like the EUROPLEXUS software dedicated to the anal...
Scientific and industrial applications that need high computational performance to be used are alway...
In hardware/software codesign, Discrete Event Simulation (DES) has been in use for decades to verify...
An application’s cache miss rate is used in timing analysis, system performance prediction and ...
purpose of this paper is to propose code transformation techniques on the application program subjec...
We present a new technique for the parallel simulation of cache coherent shared memory multiprocess...
Molecular dynamics (MD) simulation has broad applications, but its irregular memory-access pattern m...
There has been a significant amount of research on hardware and software support for efficient concu...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
The memory-processor speed gap has grown so large that in modern systems accessing the main memory r...
Shared-memory multi-processor/multi-core machines have become a reference for many application conte...
Abstract—The emergence of multi-core systems opens new opportunities for thread-level parallelism an...
This paper introduces a dynamic layout optimization strategy to minimize the number of cycles spent ...
Memory subsystem, in particular, cache design is important for both high performance and embedded co...
International audienceParallelizing industrial simulation codes like the EUROPLEXUS software dedicat...
AbstractParallelizing industrial simulation codes like the EUROPLEXUS software dedicated to the anal...
Scientific and industrial applications that need high computational performance to be used are alway...
In hardware/software codesign, Discrete Event Simulation (DES) has been in use for decades to verify...
An application’s cache miss rate is used in timing analysis, system performance prediction and ...
purpose of this paper is to propose code transformation techniques on the application program subjec...
We present a new technique for the parallel simulation of cache coherent shared memory multiprocess...
Molecular dynamics (MD) simulation has broad applications, but its irregular memory-access pattern m...
There has been a significant amount of research on hardware and software support for efficient concu...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
The memory-processor speed gap has grown so large that in modern systems accessing the main memory r...
Shared-memory multi-processor/multi-core machines have become a reference for many application conte...
Abstract—The emergence of multi-core systems opens new opportunities for thread-level parallelism an...
This paper introduces a dynamic layout optimization strategy to minimize the number of cycles spent ...
Memory subsystem, in particular, cache design is important for both high performance and embedded co...