This research was made possible by the guidance of Priya Narasimhan A significant challenge in developing automated problem-diagnosis tools for distributed systems is the ability of these tools to differentiate between changes in system behavior due to workload changes from those due to faults. To address this challenge, current, typically white-box, techniques extract semantically-rich knowledge about the target application through fairly invasive, high-overhead instrumentation. We propose and explore two scalable, low-overhead, non-invasive techniques to infer semantics about target distributed systems, in a black-box manner, to facilitate problem diagnosis. RAMS applies statistical analysis on hardware performance counters to predict whe...
Modern distributed systems are characterized by a growing complexity of their architecture, function...
Abstract—To diagnose performance problems in production systems, many OS kernel-level monitoring and...
Increasingly, distributed systems are being used to host all manner of applications. While these pla...
A significant challenge in developing automated problem-diagnosis tools for distributed systems is t...
Many interesting large-scale systems are distributed systems of multiple communicating components. S...
Diagnosing performance problems in modern datacenters and distributed systems is challenging, as the...
<p>Large production systems are susceptible to chronic performance problems where the system still w...
Detection, diagnosis and mitigation of performance problems in today\u27s large-scale distributed an...
Failures in computing systems are unavoidable. Therefore, it is important to detect and diagnose fai...
This paper discusses a methodology for diagnosing performance problems for parallel and distributed ...
In order to prevent violation of service-level objectives and to guarantee good user experience, det...
Distributed software systems have become the backbone of Internet services. Failures in pro-duction ...
As today\u27s distributed applications increase in complexity, it becomes increasingly difficult to ...
Applications may have unintended performance problems in spite of compiler optimizations, because of...
Bugs in distributed systems are often hard to find. Many bugs reflect discrepancies between a system...
Modern distributed systems are characterized by a growing complexity of their architecture, function...
Abstract—To diagnose performance problems in production systems, many OS kernel-level monitoring and...
Increasingly, distributed systems are being used to host all manner of applications. While these pla...
A significant challenge in developing automated problem-diagnosis tools for distributed systems is t...
Many interesting large-scale systems are distributed systems of multiple communicating components. S...
Diagnosing performance problems in modern datacenters and distributed systems is challenging, as the...
<p>Large production systems are susceptible to chronic performance problems where the system still w...
Detection, diagnosis and mitigation of performance problems in today\u27s large-scale distributed an...
Failures in computing systems are unavoidable. Therefore, it is important to detect and diagnose fai...
This paper discusses a methodology for diagnosing performance problems for parallel and distributed ...
In order to prevent violation of service-level objectives and to guarantee good user experience, det...
Distributed software systems have become the backbone of Internet services. Failures in pro-duction ...
As today\u27s distributed applications increase in complexity, it becomes increasingly difficult to ...
Applications may have unintended performance problems in spite of compiler optimizations, because of...
Bugs in distributed systems are often hard to find. Many bugs reflect discrepancies between a system...
Modern distributed systems are characterized by a growing complexity of their architecture, function...
Abstract—To diagnose performance problems in production systems, many OS kernel-level monitoring and...
Increasingly, distributed systems are being used to host all manner of applications. While these pla...