International audienceThis article describes how we manage to increase performance and to extend features of a large parallel application through the use of simultaneous multithreading (SMT) and by designing a robust parallel transpose algorithm. The semi-Lagrangian code Gysela typically performs large physics simulations using a few thousands of cores, between 1k cores up to 16k on x86-based clusters. However, simulations with finer resolutions and with kinetic electrons increase those needs by a huge factor, providing a good example of applications requiring Exascale machines. To improve Gysela compute times, we take advantage of efficient SMT implementations available on recent INTEL architectures. We also analyze the cost of a transposi...
For more than a decade single compute core performance is no longer doubling every 18-24months. Phys...
International audienceGyrokinetic simulations lead to huge computational needs. Up to now, the semi-...
Over the last few decades, Message Passing Interface (MPI) has become the parallel-communication sta...
International audienceThis article describes how we manage to increase performance and to extend fea...
A tuned and scalable fast multipole method as a preeminent algorithm for exascale systems Rio Yokota...
The limits of sequential processing continue to be overcome with parallel and distributed architectu...
In this whitepaper we describe the effort we have made to measure performance of applications and sy...
International audienceThe current generation of the Xeon Phi Knights Landing (KNL) processor provide...
The number of active threads in a multi-core processor varies over time and is often much smaller th...
With processor clock speeds having stagnated, parallel computing architectures have achieved a break...
Modern computer architectures have evolved towards multi-core, multi-socket CPUs. Exploiting optimal...
Gyrokinetic simulations lead to huge computational needs. Up to now, the semi- Lagrangian co...
To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruc...
Communication and computation overlapping techniques have been introduced in the five‐dimensional gy...
Gyrokinetic simulations lead to huge computational needs. Up to now, the semi- Lagrangian co...
For more than a decade single compute core performance is no longer doubling every 18-24months. Phys...
International audienceGyrokinetic simulations lead to huge computational needs. Up to now, the semi-...
Over the last few decades, Message Passing Interface (MPI) has become the parallel-communication sta...
International audienceThis article describes how we manage to increase performance and to extend fea...
A tuned and scalable fast multipole method as a preeminent algorithm for exascale systems Rio Yokota...
The limits of sequential processing continue to be overcome with parallel and distributed architectu...
In this whitepaper we describe the effort we have made to measure performance of applications and sy...
International audienceThe current generation of the Xeon Phi Knights Landing (KNL) processor provide...
The number of active threads in a multi-core processor varies over time and is often much smaller th...
With processor clock speeds having stagnated, parallel computing architectures have achieved a break...
Modern computer architectures have evolved towards multi-core, multi-socket CPUs. Exploiting optimal...
Gyrokinetic simulations lead to huge computational needs. Up to now, the semi- Lagrangian co...
To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruc...
Communication and computation overlapping techniques have been introduced in the five‐dimensional gy...
Gyrokinetic simulations lead to huge computational needs. Up to now, the semi- Lagrangian co...
For more than a decade single compute core performance is no longer doubling every 18-24months. Phys...
International audienceGyrokinetic simulations lead to huge computational needs. Up to now, the semi-...
Over the last few decades, Message Passing Interface (MPI) has become the parallel-communication sta...