Diagnosing and repairing problems in complex distributed systems has always been challenging. A wide variety of problems can happen in distributed systems: routers can be misconfigured, nodes can be hacked, and the control software can have bugs. This is further complicated by the complexity and scale of today’s distributed systems. Provenance is an attractive way to diagnose faults in distributed systems, because it can track the causality from a symptom to a set of root causes. Prior work on network provenance has successfully applied provenance to distributed systems. However, they cannot explain problems beyond the presence of faulty events and offer limited help with finding repairs. In this dissertation, we extend provenance to handle...
Distributed systems play a critical role in people\u27s daily lives. They provide functions such as ...
In this paper, we explore the use of provenance for analyzing execution dynamics in distributed syst...
We consider issues of fault tolerance for distributed computing systems at two levels of system desi...
Diagnosing and repairing problems in complex distributed systems has always been challenging. A wide...
Diagnosing and repairing problems in complex distributed systems has always been challenging. A wide...
In large-scale networks, many things can go wrong: routers can be misconfigured, programs can be bug...
In large-scale networks, many things can go wrong: routers can be misconfigured, programs can be bug...
In this paper, we propose a new approach to diagnosing prob-lems in complex networks. Our approach i...
When debugging a distributed system, it is sometimes necessary to explain the absence of an event – ...
Failures in computing systems are unavoidable. Therefore, it is important to detect and diagnose fai...
The ability to reason about changes in a distributed system’s state enables network administrators t...
Operators of distributed systems often find themselves needing to answer forensic questions, to perf...
Network accountability, forensic analysis, and failure diagnosis are becoming increasingly important...
Network accountability, forensic analysis, and failure diagnosis are becoming increasingly important...
We demonstrate NetTrails, a declarative platform for maintaining and interactively querying network ...
Distributed systems play a critical role in people\u27s daily lives. They provide functions such as ...
In this paper, we explore the use of provenance for analyzing execution dynamics in distributed syst...
We consider issues of fault tolerance for distributed computing systems at two levels of system desi...
Diagnosing and repairing problems in complex distributed systems has always been challenging. A wide...
Diagnosing and repairing problems in complex distributed systems has always been challenging. A wide...
In large-scale networks, many things can go wrong: routers can be misconfigured, programs can be bug...
In large-scale networks, many things can go wrong: routers can be misconfigured, programs can be bug...
In this paper, we propose a new approach to diagnosing prob-lems in complex networks. Our approach i...
When debugging a distributed system, it is sometimes necessary to explain the absence of an event – ...
Failures in computing systems are unavoidable. Therefore, it is important to detect and diagnose fai...
The ability to reason about changes in a distributed system’s state enables network administrators t...
Operators of distributed systems often find themselves needing to answer forensic questions, to perf...
Network accountability, forensic analysis, and failure diagnosis are becoming increasingly important...
Network accountability, forensic analysis, and failure diagnosis are becoming increasingly important...
We demonstrate NetTrails, a declarative platform for maintaining and interactively querying network ...
Distributed systems play a critical role in people\u27s daily lives. They provide functions such as ...
In this paper, we explore the use of provenance for analyzing execution dynamics in distributed syst...
We consider issues of fault tolerance for distributed computing systems at two levels of system desi...