Recently a Checkpointing and Communication Library (CCL) for optimistic simulation on Myrinet based Network of Workstations (NOWs) has been presented. CCL offloads checkpoint operations from the CPU by charging them to a programmable DMA engine on the Myrinet network card. CCL includes also functionalities for freezing the simulation application on demand, which can be used for data consistency maintenance (for example when a state buffer needs to be accessed for further modifications while a DMA based checkpoint operation involving it is still in progress). Programming the DMA to perform a checkpoint operation by transferring large data blocks in a single burst allows the latency of any checkpoint operation to be kept low. This reduces the...
The increasing number of cores on current supercomputers will quickly decrease the mean time to fail...
Checkpointing is widely used in robust fault-tolerant applications. We present an efficient incremen...
Checkpointing schemes enable fault-tolerant parallel and distributed computing by leveraging the red...
Recently a Checkpointing and Communication Library (CCL) for optimistic simulation on Myrinet based ...
CCL (checkpointing and communication library) is a software layer in support of optimistic parallel ...
Checkpointing-and-Communication Library (CCL) is a recently developed software implementing CPU offl...
Great effort has been devoted to the design of optimized checkpointing strategies for optimistic par...
CCL (Checkpointing and Communication Library) is a recently developed software in support of optimis...
Checkpointing overhead is a major obstacle for the effectiveness of Time Warp parallel discrete even...
This paper describes a non-blocking checkpointing mode in support of optimistic parallel discrete e...
In this paper we present a communication layer for Myrinet based clusters, designed to efficiently s...
In this article we focus on checkpoint/restore facilities for optimistic simulation objects with gen...
Discrete event simulation is an important tool for modeling and analysis. Some of the simulation app...
In this paper we present a software approach, namely Fast-software-Checkpointing (FSC), to reduce th...
By leveraging the enormous amount of computational capabilities, scientists today are being able to ...
The increasing number of cores on current supercomputers will quickly decrease the mean time to fail...
Checkpointing is widely used in robust fault-tolerant applications. We present an efficient incremen...
Checkpointing schemes enable fault-tolerant parallel and distributed computing by leveraging the red...
Recently a Checkpointing and Communication Library (CCL) for optimistic simulation on Myrinet based ...
CCL (checkpointing and communication library) is a software layer in support of optimistic parallel ...
Checkpointing-and-Communication Library (CCL) is a recently developed software implementing CPU offl...
Great effort has been devoted to the design of optimized checkpointing strategies for optimistic par...
CCL (Checkpointing and Communication Library) is a recently developed software in support of optimis...
Checkpointing overhead is a major obstacle for the effectiveness of Time Warp parallel discrete even...
This paper describes a non-blocking checkpointing mode in support of optimistic parallel discrete e...
In this paper we present a communication layer for Myrinet based clusters, designed to efficiently s...
In this article we focus on checkpoint/restore facilities for optimistic simulation objects with gen...
Discrete event simulation is an important tool for modeling and analysis. Some of the simulation app...
In this paper we present a software approach, namely Fast-software-Checkpointing (FSC), to reduce th...
By leveraging the enormous amount of computational capabilities, scientists today are being able to ...
The increasing number of cores on current supercomputers will quickly decrease the mean time to fail...
Checkpointing is widely used in robust fault-tolerant applications. We present an efficient incremen...
Checkpointing schemes enable fault-tolerant parallel and distributed computing by leveraging the red...