Operating systems often manage critical infrastructures where failures can have serious consequences. This raises great concerns about their robustness. From the user perspective, it is the service delivered by host applications that needs to be dependable. Operating systems should therefore provide comprehensive error detection and recovery services to those applications, so that the system as a whole can be dependable and secure. This paper addresses the recovery flow that takes place after an application error is detected. The goal is to combine existing techniques into a set of operating system services that support application recovery both from software and hardware errors. We describe a prototype system where these services are curre...
This paper tests the hypothesis that generic recovery techniques, such as process pairs, can survive...
Application uptime is critical to every administrator. The factors which cause system downtime are o...
Studies have shown that device drivers and extensions contain 3–7 times more bugs than other code an...
Operating systems often manage critical infrastructures where failures can have serious consequences...
User applications and data in volatile memory are usually lost when an operating system crashes beca...
Gracefully recovering from software and hardware faults is important to ensuring highly reliable an...
Despite many decades of research, the management of errors in a live operating system remains a chal...
Criticul infrastructure applications pmvide services upon which society depends heavily; such applic...
We propose a new approach for reacting to a wide variety of software failures, ranging from remotely...
This paper proposes an approach to software faults diagnosis in complex fault tolerant systems, enco...
We present a new technique that enables software recovery in legacy applications by retrofitting exc...
This study focuses on how to confine error recovery to the immediate environment of a failed computa...
We present an in-depth analysis of the crash-recovery problem and propose a novel approach to recove...
Errors that occur in operating systems usually impact all user applications and may render a compute...
Abstract. We present a new approach for developing robust software applica-tions that breaks depende...
This paper tests the hypothesis that generic recovery techniques, such as process pairs, can survive...
Application uptime is critical to every administrator. The factors which cause system downtime are o...
Studies have shown that device drivers and extensions contain 3–7 times more bugs than other code an...
Operating systems often manage critical infrastructures where failures can have serious consequences...
User applications and data in volatile memory are usually lost when an operating system crashes beca...
Gracefully recovering from software and hardware faults is important to ensuring highly reliable an...
Despite many decades of research, the management of errors in a live operating system remains a chal...
Criticul infrastructure applications pmvide services upon which society depends heavily; such applic...
We propose a new approach for reacting to a wide variety of software failures, ranging from remotely...
This paper proposes an approach to software faults diagnosis in complex fault tolerant systems, enco...
We present a new technique that enables software recovery in legacy applications by retrofitting exc...
This study focuses on how to confine error recovery to the immediate environment of a failed computa...
We present an in-depth analysis of the crash-recovery problem and propose a novel approach to recove...
Errors that occur in operating systems usually impact all user applications and may render a compute...
Abstract. We present a new approach for developing robust software applica-tions that breaks depende...
This paper tests the hypothesis that generic recovery techniques, such as process pairs, can survive...
Application uptime is critical to every administrator. The factors which cause system downtime are o...
Studies have shown that device drivers and extensions contain 3–7 times more bugs than other code an...