Employing fault tolerance often introduces a time overhead, which may cause a deadline violation in real-time systems (RTS). Therefore, for RTS it is important to optimize the fault tolerance techniques such that the probability to meet the deadlines, i.e. the Level of Confidence (LoC), is maximized. Previous studies have focused on evaluating the LoC for equidistant checkpointing. However, no studies have addressed the problem of evaluating the LoC for non-equidistant checkpointing. In this work, we provide an expression to evaluate the LoC for non-equidistant checkpointing. Further, we detail an exhaustive search approach to find the distribution of a given number of checkpoints that results in the maximal LoC. Since the exhaustive search...
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications...
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault ...
International audienceThis work provides an analysis of checkpointing strategies for minimizing expe...
Employing fault tolerance often introduces a time overhead, which may cause a deadline violation in ...
To combat the increasing soft error rates in recent semiconductor technologies, it is important to e...
Correct operation of real-time systems (RTS) is defined as producing correct results within given ti...
The application of checkpointing as a fault-tolerance measure for real-time services (i.e., services...
The application of checkpointing as a fault-tolerance measure for real-time services (i.e., services...
For the vast majority of computer systems correct operation is defined as producing the correct resu...
Increasing soft error rates in recent semiconductor technologies enforce the usage of fault toleranc...
Cooperative checkpointing uses global knowledge of the state and health of the machine to improve pe...
Checkpointing schemes enable fault-tolerant parallel and distributed computing by leveraging the red...
Abstract: Checkpointing is a procedure of storing process state to a file, which is later used to re...
Abstract — Checkpointing is a typical approach to tolerate failures in today’s supercomputing cluste...
International audience—The traditional single-level checkpointing method suffers from significant ov...
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications...
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault ...
International audienceThis work provides an analysis of checkpointing strategies for minimizing expe...
Employing fault tolerance often introduces a time overhead, which may cause a deadline violation in ...
To combat the increasing soft error rates in recent semiconductor technologies, it is important to e...
Correct operation of real-time systems (RTS) is defined as producing correct results within given ti...
The application of checkpointing as a fault-tolerance measure for real-time services (i.e., services...
The application of checkpointing as a fault-tolerance measure for real-time services (i.e., services...
For the vast majority of computer systems correct operation is defined as producing the correct resu...
Increasing soft error rates in recent semiconductor technologies enforce the usage of fault toleranc...
Cooperative checkpointing uses global knowledge of the state and health of the machine to improve pe...
Checkpointing schemes enable fault-tolerant parallel and distributed computing by leveraging the red...
Abstract: Checkpointing is a procedure of storing process state to a file, which is later used to re...
Abstract — Checkpointing is a typical approach to tolerate failures in today’s supercomputing cluste...
International audience—The traditional single-level checkpointing method suffers from significant ov...
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications...
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault ...
International audienceThis work provides an analysis of checkpointing strategies for minimizing expe...