Application-level checkpointing has been one of the most popular techniques to proactively deal with unexpected fail-ures in supercomputers with hundreds of thousands of cores. Unfortunately, this approach results in heavy I/O load and often causes I/O bottlenecks in production runs. In this pa-per, we examine a new thread-based application-level check-pointing for a massively parallel electromagnetic solver sys-tem on the IBM Blue Gene/P at Argonne National Lab-oratory and the Cray XK6 at Oak Ridge National Labo-ratory. We discuss an I/O-thread based, application-level, two-phase I/O approach, called “threaded reduced-blocking I/O ” (threaded rbIO), and compare it with a regular version of“reduced-blocking I/O”(rbIO) and a tunedMPI-IO coll...
Many scientific applications are I/O intensive and have tremendous I/O requirements, including check...
As the number of CPU cores in high-performance computing platforms continues to grow, the availabili...
Clusters of industry-standard multiprocessors are emerging as a competitive alternative for large-sc...
Abstract—As the number of processors increases to hundreds of thousands in parallel computer archite...
International audienceInput/output (I/O) from various sources often contend for scarcely available b...
As the size of supercomputers increases, the probability of system failure grows substantially, posi...
This paper investigates approaches for massively parallel partitioned solver systems. Typically, suc...
High performance computing (HPC) is changing the way science is performed in the 21st Century; exper...
We present a new approach to handling the demanding I/O workload incurred during checkpoint writes e...
Abstract—Parallel applications are usually able to achieve high computational performance but suffer...
The large scale of current and next-generation massively parallel processing (MPP) systems presents ...
Altres ajuts: acord transformatiu CRUE-CSICDue to the increase and complexity of computer systems, r...
Multiple threads running in a single, shared address space is a simple model for writing parallel pr...
In this paper we present research on improving the resilience of the execution of scientific softwar...
International audienceEfficient checkpointing of distributed data structures periodically at key mom...
Many scientific applications are I/O intensive and have tremendous I/O requirements, including check...
As the number of CPU cores in high-performance computing platforms continues to grow, the availabili...
Clusters of industry-standard multiprocessors are emerging as a competitive alternative for large-sc...
Abstract—As the number of processors increases to hundreds of thousands in parallel computer archite...
International audienceInput/output (I/O) from various sources often contend for scarcely available b...
As the size of supercomputers increases, the probability of system failure grows substantially, posi...
This paper investigates approaches for massively parallel partitioned solver systems. Typically, suc...
High performance computing (HPC) is changing the way science is performed in the 21st Century; exper...
We present a new approach to handling the demanding I/O workload incurred during checkpoint writes e...
Abstract—Parallel applications are usually able to achieve high computational performance but suffer...
The large scale of current and next-generation massively parallel processing (MPP) systems presents ...
Altres ajuts: acord transformatiu CRUE-CSICDue to the increase and complexity of computer systems, r...
Multiple threads running in a single, shared address space is a simple model for writing parallel pr...
In this paper we present research on improving the resilience of the execution of scientific softwar...
International audienceEfficient checkpointing of distributed data structures periodically at key mom...
Many scientific applications are I/O intensive and have tremendous I/O requirements, including check...
As the number of CPU cores in high-performance computing platforms continues to grow, the availabili...
Clusters of industry-standard multiprocessors are emerging as a competitive alternative for large-sc...