Diagnosing performance problems in modern datacenters and distributed systems is challenging, as the root cause could be contained in any one of the system’s numerous components or, worse, could be a result of interactions among them. As distributed systems continue to increase in complexity, diagnosis tasks will only become more challenging. There is a need for a new class of diagnosis techniques capable of helping developers address problems in these distributed environments. As a step toward satisfying this need, this dissertation proposes a novel technique, called request-flow comparison, for automatically localizing the sources of performance changes from the myriad potential culprits in a distributed system to just a few potential one...
A significant challenge in developing automated problem-diagnosis tools for distributed systems is t...
The industry-wide movement toward large data centers and cloud computing has brought many economic a...
Applications implementing cloud services, such as HDFS, Hadoop YARN, Cassandra, and HBase, are mostl...
The causes of performance changes in a distributed system often elude even its developers. This pape...
Spectroscope is a new toolset aimed at assisting developers with the long-standing challenge of perf...
Large production systems are susceptible to chronic performance problems where the system still work...
Cloud datacenters comprise hundreds or thousands of disparate application services, each having stri...
Detection, diagnosis and mitigation of performance problems in today\u27s large-scale distributed an...
Software that performs well in one environment may be unusably slow in another, and determining the ...
Making request flow tracing an integral part of soft-ware systems creates the potential to better un...
Distributed software systems have become the backbone of Internet services. Failures in pro-duction ...
This research was made possible by the guidance of Priya Narasimhan A significant challenge in devel...
Stragglers, which are tasks that operate significantly slower than other tasks in a system, are a bi...
[[abstract]]It is important to keep an information system work properly with efficient performance i...
Diagnosing performance degradation in distributed systems is a complex and difficult task. Software...
A significant challenge in developing automated problem-diagnosis tools for distributed systems is t...
The industry-wide movement toward large data centers and cloud computing has brought many economic a...
Applications implementing cloud services, such as HDFS, Hadoop YARN, Cassandra, and HBase, are mostl...
The causes of performance changes in a distributed system often elude even its developers. This pape...
Spectroscope is a new toolset aimed at assisting developers with the long-standing challenge of perf...
Large production systems are susceptible to chronic performance problems where the system still work...
Cloud datacenters comprise hundreds or thousands of disparate application services, each having stri...
Detection, diagnosis and mitigation of performance problems in today\u27s large-scale distributed an...
Software that performs well in one environment may be unusably slow in another, and determining the ...
Making request flow tracing an integral part of soft-ware systems creates the potential to better un...
Distributed software systems have become the backbone of Internet services. Failures in pro-duction ...
This research was made possible by the guidance of Priya Narasimhan A significant challenge in devel...
Stragglers, which are tasks that operate significantly slower than other tasks in a system, are a bi...
[[abstract]]It is important to keep an information system work properly with efficient performance i...
Diagnosing performance degradation in distributed systems is a complex and difficult task. Software...
A significant challenge in developing automated problem-diagnosis tools for distributed systems is t...
The industry-wide movement toward large data centers and cloud computing has brought many economic a...
Applications implementing cloud services, such as HDFS, Hadoop YARN, Cassandra, and HBase, are mostl...