As an increasing number of modern big data systems utilize horizontal scaling,the general trend in the distributed systems world has been to use general purposecom- modity hardware to reduce capital expenditure. System failures resulting fromthe use of inferior hardware have therefore become common at scale. Further, congesteddatacenter networks can result in high communication latencies and packetdrops at network switches. Coded computing is a novel computing technique basedon error correcting codes that aims to achieve algorithm based fault tolerance in adistributed system that is composed of unreliable compute nodes and networks. Inthis thesis, we explore the application of coded computing techniques to the problemof distributed matrix m...
The problem considered is that of distributing machine learning operations of matrix multiplication ...
A novel fault-tolerant computation technique based on array Belief Propagation (BP)-decodable XOR (B...
This study presents a novel coded computation technique for distributed matrix-matrix product comput...
As an increasing number of modern big data systems utilize horizontal scaling,the general trend in t...
We present a new approach to fault tolerance for High Performance Computing system. Our approach is ...
A ubiquitous problem in computer science research is the optimization of computation on large data s...
With the proliferation of parallel and distributed systems, it is an increasingly important problem ...
The lack of efficient resilience solutions is expected to be a major problem for the coming exascale...
Matrix multiplication is a fundamental building block in many machine learning models. As the input ...
Abstract- The rapid progress in VLSI technology has reduced the cost of hardware, allowing multiple ...
This paper addresses the question as to whether there is potential gain to be made from executing su...
Robustness is a fundamental and timeless issue, and it remains vital to all aspects of computation s...
Coded computation techniques provide robustness against straggling workers in distributed computing....
Coded distributed computing is an effective framework to improve the speed of distributed computing ...
In this dissertation, the constructions and schemes for flexible coding in distributed systems are i...
The problem considered is that of distributing machine learning operations of matrix multiplication ...
A novel fault-tolerant computation technique based on array Belief Propagation (BP)-decodable XOR (B...
This study presents a novel coded computation technique for distributed matrix-matrix product comput...
As an increasing number of modern big data systems utilize horizontal scaling,the general trend in t...
We present a new approach to fault tolerance for High Performance Computing system. Our approach is ...
A ubiquitous problem in computer science research is the optimization of computation on large data s...
With the proliferation of parallel and distributed systems, it is an increasingly important problem ...
The lack of efficient resilience solutions is expected to be a major problem for the coming exascale...
Matrix multiplication is a fundamental building block in many machine learning models. As the input ...
Abstract- The rapid progress in VLSI technology has reduced the cost of hardware, allowing multiple ...
This paper addresses the question as to whether there is potential gain to be made from executing su...
Robustness is a fundamental and timeless issue, and it remains vital to all aspects of computation s...
Coded computation techniques provide robustness against straggling workers in distributed computing....
Coded distributed computing is an effective framework to improve the speed of distributed computing ...
In this dissertation, the constructions and schemes for flexible coding in distributed systems are i...
The problem considered is that of distributing machine learning operations of matrix multiplication ...
A novel fault-tolerant computation technique based on array Belief Propagation (BP)-decodable XOR (B...
This study presents a novel coded computation technique for distributed matrix-matrix product comput...