The gap between a supercomputer's theoretical maximum (\peak") oatingpoint performance and that actually achieved by applications has grown wider over time. Today, a typical scientific application achieves only 5{20% of any given machine's peak processing capability, and this gap leaves room for significant improvements in execution times. This problem is most pronounced for modern \accelerator" architectures { collections of hundreds of simple, low-clocked cores capable of executing the same instruction on dozens of pieces of data simultaneously. This is a significant change from the low number of high-clocked cores found in traditional CPUs, and effective utilisation of accelerators typically requires extensive code and algori...
International audienceSince several years, classical multiprocessor systems have evolved to multicor...
Input/output (I/O) operations can represent a significant proportion of the run-time when large scie...
Ensuring the continuous scaling of parallel applications is challenging on many-core processors, due...
This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level be...
This paper investigates the development of a molecular dynamics code that is highly portable between...
With processor clock speeds having stagnated, parallel computing architectures have achieved a break...
Current supercomputer development trends present severe challenges for scientific codebases. Moore’s...
The amelioration of high performance computing platforms has provided unprecedented computing power ...
Parallelism is ubiquitous in modern computer architectures. Heterogeneity of CPU cores and deep memo...
International audienceComputing hardware, from mobile devices to supercomputer clusters, is undergoi...
Portability, an oftentimes sought-after goal in scientific applications, confers a number of possibl...
<p>The design of microprocessor technology has hit several "walls" in recent decades. These limits o...
Legacy code performance has failed to keep up with that of modern hardware. Many new hardware featur...
This paper presents an investigation into the development of performance metrics for sequential and ...
Significantly increasing intra-node parallelism is widely recognised as being a key prerequisite for...
International audienceSince several years, classical multiprocessor systems have evolved to multicor...
Input/output (I/O) operations can represent a significant proportion of the run-time when large scie...
Ensuring the continuous scaling of parallel applications is challenging on many-core processors, due...
This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level be...
This paper investigates the development of a molecular dynamics code that is highly portable between...
With processor clock speeds having stagnated, parallel computing architectures have achieved a break...
Current supercomputer development trends present severe challenges for scientific codebases. Moore’s...
The amelioration of high performance computing platforms has provided unprecedented computing power ...
Parallelism is ubiquitous in modern computer architectures. Heterogeneity of CPU cores and deep memo...
International audienceComputing hardware, from mobile devices to supercomputer clusters, is undergoi...
Portability, an oftentimes sought-after goal in scientific applications, confers a number of possibl...
<p>The design of microprocessor technology has hit several "walls" in recent decades. These limits o...
Legacy code performance has failed to keep up with that of modern hardware. Many new hardware featur...
This paper presents an investigation into the development of performance metrics for sequential and ...
Significantly increasing intra-node parallelism is widely recognised as being a key prerequisite for...
International audienceSince several years, classical multiprocessor systems have evolved to multicor...
Input/output (I/O) operations can represent a significant proportion of the run-time when large scie...
Ensuring the continuous scaling of parallel applications is challenging on many-core processors, due...