Periodic application of time-redundant error checking provides the trade-off between error detection latency and performance degradation. The goal is to achieve high error coverage while satisfying performance requirements. We derive the optimal scheduling of checking patterns in order to uniformly distribute the available checking capability and maximize the error coverage. Synchronous buffering designs using data forwarding and dynamic reconfiguration are described. Efficient single-cycle diagnosis is implemented by error pattern analysis and direct-mapped recovery cache. A rollback recovery scheme using start-up control for local recovery is also presented
International audienceThis work provides an analysis of checkpointing strategies for minimizing expe...
This paper describes a checkpoint comparison and optimistic execution technique for error detection ...
We propose a low cost concurrent error detection strategy to improve the Reliability, Availability, ...
Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNational Aeronautics ...
Processor arrays can provide an attractive architecture for some applications. Featuring modularity,...
Coordinated Science Laboratory was formerly known as Control Systems LaboratorySDIO/IST and Office o...
International audienceThis paper investigates the optimal number of processors to execute a parallel...
This paper investigates the optimal number of processors to execute a parallel job, whose speedup pr...
This paper describes a methodology based on dependency graphs for doing concurrent runtime error det...
Abstract*: In this paper we propose two algorithm-level time redundancy based Concurrent Error Detec...
Checkpointing schemes enable fault-tolerant parallel and distributed computing by leveraging the red...
Processor error detection can be reduced in cost significantly by exploiting the parallelism that ex...
A methodology for designing systems with concurrent error detection capability is introduced. The pr...
This thesis studies a forward recovery strategy using checkpointing and optimistic execution in para...
We propose a low cost concurrent error detection strategy to improve the Reliability, Availability, ...
International audienceThis work provides an analysis of checkpointing strategies for minimizing expe...
This paper describes a checkpoint comparison and optimistic execution technique for error detection ...
We propose a low cost concurrent error detection strategy to improve the Reliability, Availability, ...
Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNational Aeronautics ...
Processor arrays can provide an attractive architecture for some applications. Featuring modularity,...
Coordinated Science Laboratory was formerly known as Control Systems LaboratorySDIO/IST and Office o...
International audienceThis paper investigates the optimal number of processors to execute a parallel...
This paper investigates the optimal number of processors to execute a parallel job, whose speedup pr...
This paper describes a methodology based on dependency graphs for doing concurrent runtime error det...
Abstract*: In this paper we propose two algorithm-level time redundancy based Concurrent Error Detec...
Checkpointing schemes enable fault-tolerant parallel and distributed computing by leveraging the red...
Processor error detection can be reduced in cost significantly by exploiting the parallelism that ex...
A methodology for designing systems with concurrent error detection capability is introduced. The pr...
This thesis studies a forward recovery strategy using checkpointing and optimistic execution in para...
We propose a low cost concurrent error detection strategy to improve the Reliability, Availability, ...
International audienceThis work provides an analysis of checkpointing strategies for minimizing expe...
This paper describes a checkpoint comparison and optimistic execution technique for error detection ...
We propose a low cost concurrent error detection strategy to improve the Reliability, Availability, ...