Abstract—Detecting and localizing performance faults is cru-cial for operating large enterprise data centers. This problem is relatively straightforward to solve if each entity (applica-tions, servers, business processes) within the data center can be instrumented and monitored explicitly. Unfortunately, such instrument-everything approach is often not tenable because of the limits imposed by enterprises on the permissible amounts of instrumentation intrusiveness and monitoring overhead. In this paper, we address the problem of achieving high accuracy of detecting and localizing performance faults in data centers, while minimizing the required instrumentation intrusiveness and overhead. We present novel algorithms for solving three key sub-...
Contemporary datacenters comprise hundreds or thousands of machines running applications requiring h...
Abstract Fault localization, a central aspect of network fault management, is a process of deducing ...
AbstractFault localization, a central aspect of network fault management, is a process of deducing t...
The proliferation of distributed internet services has reaffirmed the need for reliable and high-per...
Large-scale data center networks are complex - comprising several thousand network devices and sever...
Data centers today are growing in size and becoming harder to manage. It is more important than ever...
Cloud datacenters comprise hundreds or thousands of disparate application services, each having stri...
Thesis (Ph.D.)--University of Washington, 2018Fast and accurate failure diagnosis remains a major ch...
We describe a new fault localization technique for software bugs in large-scale computing systems. O...
In this paper, the performance limits of faults localization are investigated using synchrophasor da...
Localizing the sources of performance problems in large enterprise networks is extremely challenging...
Modern enterprise networks encompass tens of thousands of network entities and present a very challe...
Troubleshooting network performance issues is a challenging task especially in large-scale data cent...
When a performance crisis occurs in a datacenter, rapid recovery requires quickly recognizing whethe...
The paper describes first results of an attempt to develop a general tool for localizing faults in a...
Contemporary datacenters comprise hundreds or thousands of machines running applications requiring h...
Abstract Fault localization, a central aspect of network fault management, is a process of deducing ...
AbstractFault localization, a central aspect of network fault management, is a process of deducing t...
The proliferation of distributed internet services has reaffirmed the need for reliable and high-per...
Large-scale data center networks are complex - comprising several thousand network devices and sever...
Data centers today are growing in size and becoming harder to manage. It is more important than ever...
Cloud datacenters comprise hundreds or thousands of disparate application services, each having stri...
Thesis (Ph.D.)--University of Washington, 2018Fast and accurate failure diagnosis remains a major ch...
We describe a new fault localization technique for software bugs in large-scale computing systems. O...
In this paper, the performance limits of faults localization are investigated using synchrophasor da...
Localizing the sources of performance problems in large enterprise networks is extremely challenging...
Modern enterprise networks encompass tens of thousands of network entities and present a very challe...
Troubleshooting network performance issues is a challenging task especially in large-scale data cent...
When a performance crisis occurs in a datacenter, rapid recovery requires quickly recognizing whethe...
The paper describes first results of an attempt to develop a general tool for localizing faults in a...
Contemporary datacenters comprise hundreds or thousands of machines running applications requiring h...
Abstract Fault localization, a central aspect of network fault management, is a process of deducing ...
AbstractFault localization, a central aspect of network fault management, is a process of deducing t...