Distributed deep learning becomes very common to reduce the overall training time by exploiting multiple computing devices (e.g., GPUs/TPUs) as the size of deep models and data sets increases. However, data communication between computing devices could be a potential bottleneck to limit the system scalability. How to address the communication problem in distributed deep learning is becoming a hot research topic recently. In this paper, we provide a comprehensive survey of the communication-efficient distributed training algorithms in both system-level and algorithmic-level optimizations. In the system-level, we demystify the system design and implementation to reduce the communication cost. In algorithmic-level, we compare different algorit...
In the realm of distributed computing, collective operations involve coordinated communication and s...
The distributed training of deep learning models faces two issues: efficiency and privacy. First of ...
Deep learning has been a very popular topic in Artificial Intelligent industry these years and can b...
The success of deep learning may be attributed in large part to remarkable growth in the size and co...
In recent years, the rapid development of new generation information technology has resulted in an u...
The rapid growth of data and ever increasing model complexity of deep neural networks (DNNs) have en...
In recent years, the rapid development of new generation information technology has resulted in an u...
In recent years, the rapid development of new generation information technology has resulted in an u...
As deep learning techniques become more and more popular, there is the need to move these applicatio...
Training a deep neural network (DNN) with a single machine consumes much time. To accelerate the tra...
Training a deep neural network (DNN) with a single machine consumes much time. To accelerate the tra...
Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators (e.g., G...
In distributed optimization and machine learning, multiple nodes coordinate to solve large problems....
In distributed optimization and machine learning, multiple nodes coordinate to solve large problems....
Accelerating and scaling the training of deep neural networks (DNNs) is critical to keep up with gro...
In the realm of distributed computing, collective operations involve coordinated communication and s...
The distributed training of deep learning models faces two issues: efficiency and privacy. First of ...
Deep learning has been a very popular topic in Artificial Intelligent industry these years and can b...
The success of deep learning may be attributed in large part to remarkable growth in the size and co...
In recent years, the rapid development of new generation information technology has resulted in an u...
The rapid growth of data and ever increasing model complexity of deep neural networks (DNNs) have en...
In recent years, the rapid development of new generation information technology has resulted in an u...
In recent years, the rapid development of new generation information technology has resulted in an u...
As deep learning techniques become more and more popular, there is the need to move these applicatio...
Training a deep neural network (DNN) with a single machine consumes much time. To accelerate the tra...
Training a deep neural network (DNN) with a single machine consumes much time. To accelerate the tra...
Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators (e.g., G...
In distributed optimization and machine learning, multiple nodes coordinate to solve large problems....
In distributed optimization and machine learning, multiple nodes coordinate to solve large problems....
Accelerating and scaling the training of deep neural networks (DNNs) is critical to keep up with gro...
In the realm of distributed computing, collective operations involve coordinated communication and s...
The distributed training of deep learning models faces two issues: efficiency and privacy. First of ...
Deep learning has been a very popular topic in Artificial Intelligent industry these years and can b...