Faced with massive data, is it possible to trade off (statistical) risk, and (computational) space and time? This challenge lies at the heart of large-scale machine learning. Using k-means clustering as a prototypical unsupervised learn-ing problem, we show how we can strategi-cally summarize the data (control space) in or-der to trade off risk and time when data is generated by a probabilistic model. Our sum-marization is based on coreset constructions from computational geometry. We also de-velop an algorithm, TRAM, to navigate the space/time/data/risk tradeoff in practice. In par-ticular, we show that for a fixed risk (or data size), as the data size increases (resp. risk in-creases) the running time of TRAM decreases. Our extensive expe...
The last several years have seen the emergence of datasets of an unprecedented scale, and solving va...
In this thesis, we study key questions that touch upon many important problems in practice which are...
A model of unsupervised learning is studied, where the environment provides N-dimensional input exam...
The massive growth of modern datasets from different sources such as videos, social networks, and se...
This dissertation explores Machine Learning in the context of computationally intensive simulations....
Unsupervised learning - i.e., learning with unlabeled data - is increasingly important given today\u...
AbstractIn this paper we propose a family of algorithms combining tree-clustering with conditioning ...
Fast and eective unsupervised clustering is a fundamental tool in unsupervised learning. Here is a n...
Many physical quantities around us vary across space or space-time. An example of a spatial quantity...
Many simulation data sets are so massive that they must be distributed among disk farms attached to ...
We examine the learning-curve sampling method, an approach for applying machinelearning algorithms t...
Abstract. When dealing with datasets containing a billion instances or with sim-ulations that requir...
Abstract. When dealing with datasets containing a billion instances or with sim-ulations that requir...
Data analysis is changing fast. Driven by a vast range of application domains and affordable tools, ...
This thesis explores one of the most fundamental questions in Machine Learning, namely, how should t...
The last several years have seen the emergence of datasets of an unprecedented scale, and solving va...
In this thesis, we study key questions that touch upon many important problems in practice which are...
A model of unsupervised learning is studied, where the environment provides N-dimensional input exam...
The massive growth of modern datasets from different sources such as videos, social networks, and se...
This dissertation explores Machine Learning in the context of computationally intensive simulations....
Unsupervised learning - i.e., learning with unlabeled data - is increasingly important given today\u...
AbstractIn this paper we propose a family of algorithms combining tree-clustering with conditioning ...
Fast and eective unsupervised clustering is a fundamental tool in unsupervised learning. Here is a n...
Many physical quantities around us vary across space or space-time. An example of a spatial quantity...
Many simulation data sets are so massive that they must be distributed among disk farms attached to ...
We examine the learning-curve sampling method, an approach for applying machinelearning algorithms t...
Abstract. When dealing with datasets containing a billion instances or with sim-ulations that requir...
Abstract. When dealing with datasets containing a billion instances or with sim-ulations that requir...
Data analysis is changing fast. Driven by a vast range of application domains and affordable tools, ...
This thesis explores one of the most fundamental questions in Machine Learning, namely, how should t...
The last several years have seen the emergence of datasets of an unprecedented scale, and solving va...
In this thesis, we study key questions that touch upon many important problems in practice which are...
A model of unsupervised learning is studied, where the environment provides N-dimensional input exam...