In this paper we present a recovery-conscious framework for improving the fault resiliency and recovery efficiency of highly concurrent embedded storage software systems. Our framework consists of a three-tier architecture and a suite of recovery conscious techniques. In the top tier, we promote fine-grained recovery at the task level by introducing recovery groups to model recovery dependencies between tasks. At the middle tier we develop highly effective mappings of dependent tasks to processor resources through careful tuning of recovery efficiency sensitive parameters. At the bottom tier, we advocate the use of recovery-conscious scheduling by careful serialization of dependent tasks, which provides high recovery efficiency w...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
This report aims to describe and improve a system recovery process in large-scale storage systems. I...
International audienceResilient computing is defined as the ability of a system to stay dependable w...
In this paper we present a recovery-conscious framework for improving the fault resiliency and recov...
Enterprises today are dealing with extremely large amounts of critical digital information that cont...
Gracefully recovering from software and hardware faults is important to ensuring highly reliable an...
This paper proposes a novel methodology and an architectural framework for handling multiple classes...
Fault-tolerant distributed applications require mechanisms to recover data lost via a process failur...
User applications and data in volatile memory are usually lost when an operating system crashes beca...
This paper presents a recovery mechanism for memoryresident databases. It uses some stable memory an...
Much research has gone into making operating systems more amenable to recovery and more resilient to...
Cataloged from PDF version of article.The increasing size and complexity of software systems has led...
Data availability is critical in distributed storage systems, especially when node failures are prev...
Fault-tolerant computing encompasses the methods that let computers perform their intended function ...
Memory system design is important for providing high reliability and availability. This dissertation...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
This report aims to describe and improve a system recovery process in large-scale storage systems. I...
International audienceResilient computing is defined as the ability of a system to stay dependable w...
In this paper we present a recovery-conscious framework for improving the fault resiliency and recov...
Enterprises today are dealing with extremely large amounts of critical digital information that cont...
Gracefully recovering from software and hardware faults is important to ensuring highly reliable an...
This paper proposes a novel methodology and an architectural framework for handling multiple classes...
Fault-tolerant distributed applications require mechanisms to recover data lost via a process failur...
User applications and data in volatile memory are usually lost when an operating system crashes beca...
This paper presents a recovery mechanism for memoryresident databases. It uses some stable memory an...
Much research has gone into making operating systems more amenable to recovery and more resilient to...
Cataloged from PDF version of article.The increasing size and complexity of software systems has led...
Data availability is critical in distributed storage systems, especially when node failures are prev...
Fault-tolerant computing encompasses the methods that let computers perform their intended function ...
Memory system design is important for providing high reliability and availability. This dissertation...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
This report aims to describe and improve a system recovery process in large-scale storage systems. I...
International audienceResilient computing is defined as the ability of a system to stay dependable w...