Designing scalable estimation algorithms is a core challenge in modern statistics. Here we introduce a framework to address this challenge based on parallel approximants, which yields estimators with provable properties that operate on the entirety of very large, distributed data sets. We first formalize the class of statistics which admit straightforward calculation in distributed environments through independent parallelization. We then show how to use such statistics to approximate arbitrary functional operators in appropriate spaces, yielding a general estimation framework that does not require data to reside entirely in memory. We characterize the $L^2$ approximation properties of our approach and provide fully implemented examples of ...
This paper presents two complementary statistical computing frameworks that address challenges in pa...
The ability to leverage large-scale hardware parallelism has been one of the key enablers of the acc...
We live in the era of big data, nowadays, many companies face data of massive size that, in most cas...
The growth in the use of computationally intensive statistical procedures, especially with big data,...
Real world data is likely to contain an inherent structure. Those structures may be represented wit...
Distributed statistical learning problems arise commonly when dealing with large datasets. In this s...
The classical framework on distributed inference considers a set of nodes taking measurements and a ...
The classical framework on distributed inference considers a set of nodes taking measurements and a ...
Three problems in distributed computing and non-parametric computationalstatistics are explored to d...
International audienceThe development of cluster computing frameworks has allowed practitioners to s...
In Divide & Recombine (D&R), data are divided into subsets, analytic methodsare applied to each subs...
With the physical constraints of semiconductor-based electronics becoming increasingly limiting in t...
article number 62International audienceThe statistical analysis of massive and complex data sets wil...
Large scale machine learning requires tradeoffs. Commonly this tradeoff has led practitioners to cho...
Resampling is a well-known statistical algorithm that is commonly applied in the context of Particle...
This paper presents two complementary statistical computing frameworks that address challenges in pa...
The ability to leverage large-scale hardware parallelism has been one of the key enablers of the acc...
We live in the era of big data, nowadays, many companies face data of massive size that, in most cas...
The growth in the use of computationally intensive statistical procedures, especially with big data,...
Real world data is likely to contain an inherent structure. Those structures may be represented wit...
Distributed statistical learning problems arise commonly when dealing with large datasets. In this s...
The classical framework on distributed inference considers a set of nodes taking measurements and a ...
The classical framework on distributed inference considers a set of nodes taking measurements and a ...
Three problems in distributed computing and non-parametric computationalstatistics are explored to d...
International audienceThe development of cluster computing frameworks has allowed practitioners to s...
In Divide & Recombine (D&R), data are divided into subsets, analytic methodsare applied to each subs...
With the physical constraints of semiconductor-based electronics becoming increasingly limiting in t...
article number 62International audienceThe statistical analysis of massive and complex data sets wil...
Large scale machine learning requires tradeoffs. Commonly this tradeoff has led practitioners to cho...
Resampling is a well-known statistical algorithm that is commonly applied in the context of Particle...
This paper presents two complementary statistical computing frameworks that address challenges in pa...
The ability to leverage large-scale hardware parallelism has been one of the key enablers of the acc...
We live in the era of big data, nowadays, many companies face data of massive size that, in most cas...