Guaranteeing high availability of networks virtually hinges on the ability to handle and recover from bugs and failures. Yet, despite the advances in verification, testing, and debugging, production networks remain susceptible to large-scale failures - - often due to deterministic bugs. This paper explores the use of input transformations as a viable method for recovering from such deterministic bugs. In particular, we introduce an online system, Tardis, for overcoming deterministic faults by using a blend of program analysis and runtime program data to systematically determine the fault-triggering input events and using domain-specific models to automatically generate transformations of the fault-triggering inputs that are both safe and se...
Bugs in network hardware can cause tremendous problems. However, programmable network devices have t...
Many critical services, such as e-commerce, emergency response, or even remote surgeries, rely on co...
As the size of networks increases, real-time fault management becomes difficult due to the volume of...
© 2004-2012 IEEE. Over the past few years, software-defined networking (SDN) has stimulated worldwid...
Large-scale networks are among the most complex software infrastructures in existence. Unfortunately...
Tolerating and recovering from link and switch failures are fundamental requirements of most network...
Tolerating and recovering from link and switch failures are fundamental requirements of most network...
Debugging software is challenging because of the increasing complexity of software and hardware, and...
As our lives become ever more dependent on network connectivity, it becomes increasingly more import...
Traditional computer networks require manual configuration of potentially hundreds of forwarding dev...
Bugs in programs are often introduced when programs evolve from a stable version to a new version. I...
Failures in computing systems are unavoidable. Therefore, it is important to detect and diagnose fai...
Modern computer networks are complex, incorporating hundreds or thousands of network devices from mu...
This thesis presents two unique sets of fault injections on mission-critical computer systems with t...
Traditionally, distributed systems requiring high dependability were designed using custom hardware ...
Bugs in network hardware can cause tremendous problems. However, programmable network devices have t...
Many critical services, such as e-commerce, emergency response, or even remote surgeries, rely on co...
As the size of networks increases, real-time fault management becomes difficult due to the volume of...
© 2004-2012 IEEE. Over the past few years, software-defined networking (SDN) has stimulated worldwid...
Large-scale networks are among the most complex software infrastructures in existence. Unfortunately...
Tolerating and recovering from link and switch failures are fundamental requirements of most network...
Tolerating and recovering from link and switch failures are fundamental requirements of most network...
Debugging software is challenging because of the increasing complexity of software and hardware, and...
As our lives become ever more dependent on network connectivity, it becomes increasingly more import...
Traditional computer networks require manual configuration of potentially hundreds of forwarding dev...
Bugs in programs are often introduced when programs evolve from a stable version to a new version. I...
Failures in computing systems are unavoidable. Therefore, it is important to detect and diagnose fai...
Modern computer networks are complex, incorporating hundreds or thousands of network devices from mu...
This thesis presents two unique sets of fault injections on mission-critical computer systems with t...
Traditionally, distributed systems requiring high dependability were designed using custom hardware ...
Bugs in network hardware can cause tremendous problems. However, programmable network devices have t...
Many critical services, such as e-commerce, emergency response, or even remote surgeries, rely on co...
As the size of networks increases, real-time fault management becomes difficult due to the volume of...