Motivated by diverse applications in ecology, genetics, and language modeling, researchers in learning, computer science, and information theory have recently studied several fundamental statistical questions in the large domain regime, where the domain size is large relative to the number of samples. We study three such basic problems with rich history and wide applications. In the course of analyzing these problems, we also provide provable guarantees for several existing practical estimators and propose estimators with better guarantees.Competitive distribution estimation and classification:Existing theory does not explain why absolute-discounting, Good-Turing, and related estimators outperform the asymptotically min-max optimal esti...
<p>Many modern applications fall into the category of "large-scale" statistical problems, in which b...
Modern technological advances have prompted massive scale data collection in manymodern fields such ...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
Estimating distributions over large alphabets is a fundamental machine-learning tenet. Yet no method...
We live in a probabilistic world---a world full of distributions from which we sample. Learning, evo...
Abstract Estimating distributions over large alphabets is a fundamental machine-learning tenet. Yet ...
Modern data science calls for statistical inference algorithms that are both data-efficient and comp...
Recent advances in genetics, computer vision, and text mining are accompanied by analyzing data comi...
Presented on September 18, 2017 at 11:00 a.m. in the Klaus Advanced Computing Building, Room 1116E.I...
The last several years have seen the emergence of datasets of an unprecedented scale, and solving va...
Feature allocation models generalize classical species sampling models by allowing every observation...
The problem of estimating discovery probabilities has regained popularity in recent years due to its...
Many results in statistics and information theory are asymptotic in nature, with the implicit assump...
This paper studies hypothesis testing and parameter estimation in the context of the divide and conq...
We derive competitive tests and estimators for several properties of discrete distributions, based o...
<p>Many modern applications fall into the category of "large-scale" statistical problems, in which b...
Modern technological advances have prompted massive scale data collection in manymodern fields such ...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
Estimating distributions over large alphabets is a fundamental machine-learning tenet. Yet no method...
We live in a probabilistic world---a world full of distributions from which we sample. Learning, evo...
Abstract Estimating distributions over large alphabets is a fundamental machine-learning tenet. Yet ...
Modern data science calls for statistical inference algorithms that are both data-efficient and comp...
Recent advances in genetics, computer vision, and text mining are accompanied by analyzing data comi...
Presented on September 18, 2017 at 11:00 a.m. in the Klaus Advanced Computing Building, Room 1116E.I...
The last several years have seen the emergence of datasets of an unprecedented scale, and solving va...
Feature allocation models generalize classical species sampling models by allowing every observation...
The problem of estimating discovery probabilities has regained popularity in recent years due to its...
Many results in statistics and information theory are asymptotic in nature, with the implicit assump...
This paper studies hypothesis testing and parameter estimation in the context of the divide and conq...
We derive competitive tests and estimators for several properties of discrete distributions, based o...
<p>Many modern applications fall into the category of "large-scale" statistical problems, in which b...
Modern technological advances have prompted massive scale data collection in manymodern fields such ...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...