Gracefully recovering from software and hardware faults is important to ensuring highly reliable and available systems. Operating systems have privileged access to all aspects of system operation, thus a fault related to them is able to affect the entire system. Existing approaches to operating system recovery either do not protect the entire system or require a completely new operating system design. This dissertation presents a new approach to fault recovery in operating systems called Recovery Domains. This approach allows recovery from unanticipated faults in commodity operating systems. Recovery is organized around the concept of a dynamic request. Operating system entry points initiate requests to perform some action. System...
System and application failures are all too common. In this dissertation we argue that operating sys...
A novel approach to application fault recovery based on autonomic computing works by accurately moni...
This paper describes how the transformational framework developed in [Liu91, LJ92] is applied to bac...
Gracefully recovering from software and hardware faults is important to ensuring highly reliable an...
User applications and data in volatile memory are usually lost when an operating system crashes beca...
Operating systems often manage critical infrastructures where failures can have serious consequences...
We present an in-depth analysis of the crash-recovery problem and propose a novel approach to recove...
Abstract — When an operating system crashes and hangs, it leaves the machine in an unusable state. A...
This study focuses on how to confine error recovery to the immediate environment of a failed computa...
We present a new technique that enables software recovery in legacy applications by retrofitting exc...
Operating system lockup errors can render a computer unusable by preventing the execution other prog...
Much research has gone into making operating systems more amenable to recovery and more resilient to...
Traditional reliability-related models for fault-tolerant systems are used to predict system reliabi...
In this paper we present a recovery-conscious framework for improving the fault resiliency and recov...
Large scale distributed computing systems have been extensively utilized to host critical applicatio...
System and application failures are all too common. In this dissertation we argue that operating sys...
A novel approach to application fault recovery based on autonomic computing works by accurately moni...
This paper describes how the transformational framework developed in [Liu91, LJ92] is applied to bac...
Gracefully recovering from software and hardware faults is important to ensuring highly reliable an...
User applications and data in volatile memory are usually lost when an operating system crashes beca...
Operating systems often manage critical infrastructures where failures can have serious consequences...
We present an in-depth analysis of the crash-recovery problem and propose a novel approach to recove...
Abstract — When an operating system crashes and hangs, it leaves the machine in an unusable state. A...
This study focuses on how to confine error recovery to the immediate environment of a failed computa...
We present a new technique that enables software recovery in legacy applications by retrofitting exc...
Operating system lockup errors can render a computer unusable by preventing the execution other prog...
Much research has gone into making operating systems more amenable to recovery and more resilient to...
Traditional reliability-related models for fault-tolerant systems are used to predict system reliabi...
In this paper we present a recovery-conscious framework for improving the fault resiliency and recov...
Large scale distributed computing systems have been extensively utilized to host critical applicatio...
System and application failures are all too common. In this dissertation we argue that operating sys...
A novel approach to application fault recovery based on autonomic computing works by accurately moni...
This paper describes how the transformational framework developed in [Liu91, LJ92] is applied to bac...