The growing demand for always-on and low-latency cloud services is driving the creation of globally distributed datacenters. A major factor affecting service availability is reliability of the network, both inside the datacenters and wide-area links connecting them. While several research efforts focus on building scale-out datacenter networks, little has been reported on real network failures and how they impact geo-distributed services. Towards improving the dependability of the underlying datacenter networks, in this dissertation, we make one of the first attempts to characterize intra-datacenter and inter-datacenter network failures from a service perspective. Specifically, we make the following contributions: 1. Analysis Methodology fo...
Most cloud computing clusters are built from unreliable, commercial off-the-shelf components compar...
Cloud computing research is in great need of statistical parameters derived from the analysis of rea...
networking; failure; reliability; measurement Enterprises must maintain and improve the reliability ...
Thesis (Ph.D.)--University of Washington, 2018Fast and accurate failure diagnosis remains a major ch...
The proliferation of distributed internet services has reaffirmed the need for reliable and high-per...
Large-scale data center networks are complex - comprising several thousand network devices and sever...
This Master’s thesis is about data analysis of fault report data to come up with a solution for prev...
The workloads running in the modern data centers of large scale Internet service providers (such asA...
Modern day datacenters host hundreds of thousands of servers that coordinate tasks in order to deliv...
Distributed computing environments are increasingly deployed over geographically spanning data cente...
Since the conception of cloud computing, ensuring its ability to provide highly reliable service has...
Cloud services are important for healthcare, banking, communication, and other purposes. Inevitably,...
Inadequate service availability is the top concern when employing Cloud computing. It has been recog...
Since the conception of cloud computing, ensuring its ability to provide highly reliable service has...
This paper presents NetSieve, a system that aims to do automated problem inference from network trou...
Most cloud computing clusters are built from unreliable, commercial off-the-shelf components compar...
Cloud computing research is in great need of statistical parameters derived from the analysis of rea...
networking; failure; reliability; measurement Enterprises must maintain and improve the reliability ...
Thesis (Ph.D.)--University of Washington, 2018Fast and accurate failure diagnosis remains a major ch...
The proliferation of distributed internet services has reaffirmed the need for reliable and high-per...
Large-scale data center networks are complex - comprising several thousand network devices and sever...
This Master’s thesis is about data analysis of fault report data to come up with a solution for prev...
The workloads running in the modern data centers of large scale Internet service providers (such asA...
Modern day datacenters host hundreds of thousands of servers that coordinate tasks in order to deliv...
Distributed computing environments are increasingly deployed over geographically spanning data cente...
Since the conception of cloud computing, ensuring its ability to provide highly reliable service has...
Cloud services are important for healthcare, banking, communication, and other purposes. Inevitably,...
Inadequate service availability is the top concern when employing Cloud computing. It has been recog...
Since the conception of cloud computing, ensuring its ability to provide highly reliable service has...
This paper presents NetSieve, a system that aims to do automated problem inference from network trou...
Most cloud computing clusters are built from unreliable, commercial off-the-shelf components compar...
Cloud computing research is in great need of statistical parameters derived from the analysis of rea...
networking; failure; reliability; measurement Enterprises must maintain and improve the reliability ...