Thesis (Ph.D.)--University of Washington, 2018Fast and accurate failure diagnosis remains a major challenge for datacenter operators. Current datacenter applications are increasingly architected around loosely-coupled modular components: each component can scale and evolve independently. However, when application failures occur, they become much harder to detect and localize. The challenges are three-fold: complex component dependency, gray failures, and unpredictable component behaviors. My thesis is that fast and accurate failure diagnosis for datacenter applications is possible using three key ideas: (1) a global view of component interactions and dependencies, (2) a penalized-regression-based failure localization algorithm that localize...
Thesis (Ph.D.)--University of Washington, 2016-09Data center networks are a key component to the exp...
Cloud computing is a novel technology in the field of distributed computing. Usage of Cloud computin...
A mechanism to detect and manage failures in a datacenter network exploiting load balancing among eq...
Thesis (Ph.D.)--University of Washington, 2018Fast and accurate failure diagnosis remains a major ch...
The proliferation of distributed internet services has reaffirmed the need for reliable and high-per...
Large-scale data center networks are complex - comprising several thousand network devices and sever...
Distributed software systems have become the backbone of Internet services. Failures in pro-duction ...
The growing demand for always-on and low-latency cloud services is driving the creation of globally ...
Data center downtime causes business losses over a million dollars per hour. 24x7-hour data availabi...
Data center networks (DCNs) are inherently failure-prone owing to the existence of many links, switc...
Abstract — A major problem in managing large-scale datacenters is diagnosing and fixing machine fail...
The workloads running in the modern data centers of large scale Internet service providers (such asA...
Distributed computing systems cover a broad range of computing infrastructures, which are heterogene...
Modern day datacenters host hundreds of thousands of servers that coordinate tasks in order to deliv...
Most recent network failure diagnosis systems focused on data center networks where complex measurem...
Thesis (Ph.D.)--University of Washington, 2016-09Data center networks are a key component to the exp...
Cloud computing is a novel technology in the field of distributed computing. Usage of Cloud computin...
A mechanism to detect and manage failures in a datacenter network exploiting load balancing among eq...
Thesis (Ph.D.)--University of Washington, 2018Fast and accurate failure diagnosis remains a major ch...
The proliferation of distributed internet services has reaffirmed the need for reliable and high-per...
Large-scale data center networks are complex - comprising several thousand network devices and sever...
Distributed software systems have become the backbone of Internet services. Failures in pro-duction ...
The growing demand for always-on and low-latency cloud services is driving the creation of globally ...
Data center downtime causes business losses over a million dollars per hour. 24x7-hour data availabi...
Data center networks (DCNs) are inherently failure-prone owing to the existence of many links, switc...
Abstract — A major problem in managing large-scale datacenters is diagnosing and fixing machine fail...
The workloads running in the modern data centers of large scale Internet service providers (such asA...
Distributed computing systems cover a broad range of computing infrastructures, which are heterogene...
Modern day datacenters host hundreds of thousands of servers that coordinate tasks in order to deliv...
Most recent network failure diagnosis systems focused on data center networks where complex measurem...
Thesis (Ph.D.)--University of Washington, 2016-09Data center networks are a key component to the exp...
Cloud computing is a novel technology in the field of distributed computing. Usage of Cloud computin...
A mechanism to detect and manage failures in a datacenter network exploiting load balancing among eq...