Wide-area systems are gaining in popularity as an infrastructure for running scientific applications. From a fault tolerance perspective, these environments are challenging due to their scale and their inherent variability. Causal message logging protocols have attractive properties that make them suitable for these environments. They spread fault tolerance information around in the system providing high availability. This information can also be used to replicate objects that are otherwise inaccessible due to network partitions. However, current causal message logging protocols do not scale to thousands or millions of processes. We describe the Hierarchical Causal Logging Protocol (HCML) that uses a hierarchy of shared logging sites, or ...
in Sender-based message logging supports transparent fault tolerance in distributed sys-tems in whic...
An important set of challenges emerge as the High Performance Computing (HPC) community aims to rea...
International audienceMessage logging is an attractive solution to provide fault tolerance for messa...
Wide-area systems are gaining in popularity as an infrastructure for running scientific applications...
Wide-area systems are gaining in popularity as an infrastructure for running scientific applications...
The era of petascale computing brought machines with hundreds of thousands of processors. The next g...
Abstract—The era of petascale computing brought machines with hundreds of thousands of processors. T...
International audienceFault tolerance in MPI becomes a main issue in the HPC community. Several appr...
Abstract—Computing systems will grow significantly larger in the near future to satisfy the needs of...
Message logging is a popular technique for building systems that can tolerate process crashes and tr...
Abstract. Casual message-logging protocols have several attractive properties: they introduce no blo...
Fault tolerance can allow processes executing in a computer system to survive failures within the sy...
Casual message-logging protocols have several attractive properties: they introduce no block-ing, se...
Abstract—A look at Exascale reveals a future with multicore supercomputers that will inexorably expe...
Abstract—The predicted failure rates of future supercom-puters loom the groundbreaking research larg...
in Sender-based message logging supports transparent fault tolerance in distributed sys-tems in whic...
An important set of challenges emerge as the High Performance Computing (HPC) community aims to rea...
International audienceMessage logging is an attractive solution to provide fault tolerance for messa...
Wide-area systems are gaining in popularity as an infrastructure for running scientific applications...
Wide-area systems are gaining in popularity as an infrastructure for running scientific applications...
The era of petascale computing brought machines with hundreds of thousands of processors. The next g...
Abstract—The era of petascale computing brought machines with hundreds of thousands of processors. T...
International audienceFault tolerance in MPI becomes a main issue in the HPC community. Several appr...
Abstract—Computing systems will grow significantly larger in the near future to satisfy the needs of...
Message logging is a popular technique for building systems that can tolerate process crashes and tr...
Abstract. Casual message-logging protocols have several attractive properties: they introduce no blo...
Fault tolerance can allow processes executing in a computer system to survive failures within the sy...
Casual message-logging protocols have several attractive properties: they introduce no block-ing, se...
Abstract—A look at Exascale reveals a future with multicore supercomputers that will inexorably expe...
Abstract—The predicted failure rates of future supercom-puters loom the groundbreaking research larg...
in Sender-based message logging supports transparent fault tolerance in distributed sys-tems in whic...
An important set of challenges emerge as the High Performance Computing (HPC) community aims to rea...
International audienceMessage logging is an attractive solution to provide fault tolerance for messa...