Recent approaches to operating system (OS) crash recov-ery have attempted to design a high-coverage component-agnostic recovery infrastructure [1, 3]. To successfully re-cover from an otherwise-fatal crash, it is necessary to deter-mine a safe execution point to resume operation consistently. Common restart strategies attempt to bring back the OS from a faulty execution state to a safe state and selectively re-play execution. Unfortunately, a faulty execution leading to a crash can result in many logical inconsistencies with complex dependencies among execution contexts (e.g. kernel threads) and OS subsystems that need to be tracked and rollbacked if necessary. Both heavyweight [3] and lightweight mecha-nisms [1] have been recently proposed...
The high complexity of modern software, and our pervasive reliance on that software, has made the pr...
We present a method to recover from failures caused by soft-ware bugs. Our method relies on two key ...
Abstract—Because of shrinking structure sizes and operating voltages, computing hardware exhibits an...
We present an in-depth analysis of the crash-recovery problem and propose a novel approach to recove...
User applications and data in volatile memory are usually lost when an operating system crashes beca...
Abstract — When an operating system crashes and hangs, it leaves the machine in an unusable state. A...
Much research has gone into making operating systems more amenable to recovery and more resilient to...
Gracefully recovering from software and hardware faults is important to ensuring highly reliable an...
100 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2008.Fault injection experiments s...
Studies have shown that device drivers and extensions contain 3–7 times more bugs than other code an...
System and application failures are all too common. In this dissertation we argue that operating sys...
textExperiences with computer systems indicate an inconvenient truth: computers fail and they fail i...
Abstract: Soft errors are emerging with the ongoing reduction of structure sizes in current and futu...
Traditional reliability-related models for fault-tolerant systems are used to predict system reliabi...
The high complexity of modern software, and our pervasive reliance on that software, has made the pr...
We present a method to recover from failures caused by soft-ware bugs. Our method relies on two key ...
Abstract—Because of shrinking structure sizes and operating voltages, computing hardware exhibits an...
We present an in-depth analysis of the crash-recovery problem and propose a novel approach to recove...
User applications and data in volatile memory are usually lost when an operating system crashes beca...
Abstract — When an operating system crashes and hangs, it leaves the machine in an unusable state. A...
Much research has gone into making operating systems more amenable to recovery and more resilient to...
Gracefully recovering from software and hardware faults is important to ensuring highly reliable an...
100 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2008.Fault injection experiments s...
Studies have shown that device drivers and extensions contain 3–7 times more bugs than other code an...
System and application failures are all too common. In this dissertation we argue that operating sys...
textExperiences with computer systems indicate an inconvenient truth: computers fail and they fail i...
Abstract: Soft errors are emerging with the ongoing reduction of structure sizes in current and futu...
Traditional reliability-related models for fault-tolerant systems are used to predict system reliabi...
The high complexity of modern software, and our pervasive reliance on that software, has made the pr...
We present a method to recover from failures caused by soft-ware bugs. Our method relies on two key ...
Abstract—Because of shrinking structure sizes and operating voltages, computing hardware exhibits an...