Despite many decades of research, the management of errors in a live operating system remains a challenging problem. This thesis presents CuriOS, an operating system that incorporates several new error management techniques that significantly improve reliability. Errors detected by both hardware and software are signaled using language exception handling mechanisms. Unhandled exceptions do not crash the operating system and are dispatched to recovery routines. The architecture of CuriOS is influenced by microkernel design principles. Individual operating system services are assigned separate protection domains. This componentization provided by traditional microkernel designs helps confine errors. However, an error that occurs in a microker...
Operating systems and hypervisors enable the collection and extraction of rich information on applic...
Unpredictable hardware faults and software bugs lead to application crashes, incorrect computations,...
Failing hardware is a fact and trends in microprocessor design indicate that the fraction of hardwar...
Despite many decades of research, the management of errors in a live operating system remains a chal...
User applications and data in volatile memory are usually lost when an operating system crashes beca...
Errors that occur in operating systems usually impact all user applications and may render a compute...
With the advance of technology, current systems are becoming much more powerful in computation, much...
The thesis presents microkernel-based software-implemented mechanisms for improving the trustworthin...
We present an in-depth analysis of the crash-recovery problem and propose a novel approach to recove...
This study focuses on how to confine error recovery to the immediate environment of a failed computa...
AbstractWe present a set of automated techniques that enable software systems to survive otherwise f...
As the number of CPU cores in high-performance computing platforms continues to grow, the availabili...
Operating systems enable collecting and extracting rich information on application execution charact...
Gracefully recovering from software and hardware faults is important to ensuring highly reliable an...
Hardware errors are projected to increase in modern computer systems due to shrinking feature sizes ...
Operating systems and hypervisors enable the collection and extraction of rich information on applic...
Unpredictable hardware faults and software bugs lead to application crashes, incorrect computations,...
Failing hardware is a fact and trends in microprocessor design indicate that the fraction of hardwar...
Despite many decades of research, the management of errors in a live operating system remains a chal...
User applications and data in volatile memory are usually lost when an operating system crashes beca...
Errors that occur in operating systems usually impact all user applications and may render a compute...
With the advance of technology, current systems are becoming much more powerful in computation, much...
The thesis presents microkernel-based software-implemented mechanisms for improving the trustworthin...
We present an in-depth analysis of the crash-recovery problem and propose a novel approach to recove...
This study focuses on how to confine error recovery to the immediate environment of a failed computa...
AbstractWe present a set of automated techniques that enable software systems to survive otherwise f...
As the number of CPU cores in high-performance computing platforms continues to grow, the availabili...
Operating systems enable collecting and extracting rich information on application execution charact...
Gracefully recovering from software and hardware faults is important to ensuring highly reliable an...
Hardware errors are projected to increase in modern computer systems due to shrinking feature sizes ...
Operating systems and hypervisors enable the collection and extraction of rich information on applic...
Unpredictable hardware faults and software bugs lead to application crashes, incorrect computations,...
Failing hardware is a fact and trends in microprocessor design indicate that the fraction of hardwar...