Distributed learning over unreliable networks

Yu, Chen
Tang, Hanlin
Renggli, Cedric
Kassing, Simon
Singla, Ankit
Alistarh, Dan-Adrian
Zhang, Ce
Liu, Ji

Publication date

January 2019

Publisher

IMLS

Abstract

Most of today's distributed machine learning systems assume reliable networks: whenever two machines exchange information (e.g., gradients or models), the network should guarantee the delivery of the message. At the same time, recent work exhibits the impressive tolerance of machine learning algorithms to errors or noise arising from relaxed communication or synchronization. In this paper, we connect these two trends, and consider the following question: Can we design machine learning systems that are tolerant to network unreliability during training? With this motivation, we focus on a theoretical problem of independent interest-given a standard distributed parameter server architecture, if every communication between the worker and the se...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Distributed learning over unreliable networks

Abstract

Extracted data

Distributed learning over unreliable networks

Abstract

Extracted data

Related items

Related items