We study the distributed machine learning problem where the n feature-response pairs are partitioned among m machines uniformly at random. The goal is to approximately solve an empirical risk minimization (ERM) problem with the minimum amount of communication. The divide-and-conquer (DC) method, which was proposed several years ago, lets every worker machine independently solve the same ERM problem using its local feature-response pairs and the driver machine combine the solutions. This approach is in one-shot and thereby extremely communication-efficient. Although the DC method has been studied by many prior works, reasonable generalization bound has not been established before this work.For the ridge regression problem, we show that the p...
A common approach to statistical learning with big-data is to randomly split it among m machines and...
A wide variety of problems in machine learning, including exemplar clustering, document summarizatio...
We investigate the performance of distributed learning for large-scale linear regression where the m...
Distributed machine learning bridges the traditional fields of distributed systems and machine learn...
We develop a new distributed algorithm to solve the ridge regression problem with feature partitioni...
We propose LOCO, a distributed algorithm which solves large-scale ridge reg-ression. LOCO randomly a...
Distributed learning provides an attractive framework for scaling the learning task by sharing the c...
We propose a new distributed algorithm for em-pirical risk minimization in machine learning. The alg...
In recent studies, the generalization properties for distributed learning and random features assume...
We explore the connection between dimensionality and communication cost in distributed learning prob...
We consider the problem ofdistributed mean estimation (DME), in which n machines are each given a lo...
With the growth in size and complexity of data, methods exploiting low-dimensional structure, as wel...
This paper studies hypothesis testing and parameter estimation in the context of the divide and conq...
We live in an age of big data. Analyzing modern data sets can be very difficult because they usually...
Distributed learning facilitates the scaling-up of data processing by distributing the computational...
A common approach to statistical learning with big-data is to randomly split it among m machines and...
A wide variety of problems in machine learning, including exemplar clustering, document summarizatio...
We investigate the performance of distributed learning for large-scale linear regression where the m...
Distributed machine learning bridges the traditional fields of distributed systems and machine learn...
We develop a new distributed algorithm to solve the ridge regression problem with feature partitioni...
We propose LOCO, a distributed algorithm which solves large-scale ridge reg-ression. LOCO randomly a...
Distributed learning provides an attractive framework for scaling the learning task by sharing the c...
We propose a new distributed algorithm for em-pirical risk minimization in machine learning. The alg...
In recent studies, the generalization properties for distributed learning and random features assume...
We explore the connection between dimensionality and communication cost in distributed learning prob...
We consider the problem ofdistributed mean estimation (DME), in which n machines are each given a lo...
With the growth in size and complexity of data, methods exploiting low-dimensional structure, as wel...
This paper studies hypothesis testing and parameter estimation in the context of the divide and conq...
We live in an age of big data. Analyzing modern data sets can be very difficult because they usually...
Distributed learning facilitates the scaling-up of data processing by distributing the computational...
A common approach to statistical learning with big-data is to randomly split it among m machines and...
A wide variety of problems in machine learning, including exemplar clustering, document summarizatio...
We investigate the performance of distributed learning for large-scale linear regression where the m...