Abstract—In this paper, we present an automated on-line ser-vice for troubleshooting performance problems in server clusters caused by unintended vicious cycles. The tool complements a large volume of prior performance troubleshooting and diagnos-tic literature for server farms that identifies problems arising due to resource bottlenecks or failed components. We show that unintended interactions between components in large-scale systems can cause performance problems even in the absence of bottlenecks or failures. Our tool leverages discriminative se-quence mining to identify anomalous sequences of events that are candidates for blame for the performance problem. The tool looks for patterns consistent with “vicious cycles ” or unstable beha...
Distributed systems have become pervasive in current society. From laptops and mobile phones, to ser...
Abstract—Performance problems, which can stem from dif-ferent system components, such as network, me...
Network management in a large organization often involves-- whether explicitly or implicitly-- the r...
<p>Large production systems are susceptible to chronic performance problems where the system still w...
With the explosion of the number of distributed applications, a new dynamic server environment emerg...
With the explosion of the number of distributed applications, a new dynamic server environment emerg...
Distributed computing environments are increasingly deployed over geographically spanning data cente...
Cloud datacenters comprise hundreds or thousands of disparate application services, each having stri...
Performance anomaly detection is crucial for long running, large scale distributed systems. However,...
With the explosion of the number of distributed applications, a new dynamic server environment emerg...
With the explosion of the number of distributed applications, a new dynamic server environment emerg...
Abstract—In this paper, we present CLUE, a system event analytics tool for black-box performance dia...
This thesis investigates the possibility of using anomaly detection on performance data of virtual s...
Large-scale clusters are growing at a rapid pace, and the resulting amount of monitoring data produc...
Detection, diagnosis and mitigation of performance problems in today\u27s large-scale distributed an...
Distributed systems have become pervasive in current society. From laptops and mobile phones, to ser...
Abstract—Performance problems, which can stem from dif-ferent system components, such as network, me...
Network management in a large organization often involves-- whether explicitly or implicitly-- the r...
<p>Large production systems are susceptible to chronic performance problems where the system still w...
With the explosion of the number of distributed applications, a new dynamic server environment emerg...
With the explosion of the number of distributed applications, a new dynamic server environment emerg...
Distributed computing environments are increasingly deployed over geographically spanning data cente...
Cloud datacenters comprise hundreds or thousands of disparate application services, each having stri...
Performance anomaly detection is crucial for long running, large scale distributed systems. However,...
With the explosion of the number of distributed applications, a new dynamic server environment emerg...
With the explosion of the number of distributed applications, a new dynamic server environment emerg...
Abstract—In this paper, we present CLUE, a system event analytics tool for black-box performance dia...
This thesis investigates the possibility of using anomaly detection on performance data of virtual s...
Large-scale clusters are growing at a rapid pace, and the resulting amount of monitoring data produc...
Detection, diagnosis and mitigation of performance problems in today\u27s large-scale distributed an...
Distributed systems have become pervasive in current society. From laptops and mobile phones, to ser...
Abstract—Performance problems, which can stem from dif-ferent system components, such as network, me...
Network management in a large organization often involves-- whether explicitly or implicitly-- the r...