Utilizing the structure of a probabilistic model can significantlyincrease its compression efficiency and learning speed. We considerthese potential improvements under two naturally-omnipresentstructures.Power-Law: English words and many other natural phenomena arewell-known to follow a power-law distribution. Yet this ubiquitousstructure has never been shown to help compress or predict thesephenomena. It is known that the class of unrestricted distributionsover alphabet of size k and blocks of length n can never becompressed with diminishing per-symbol redundancy, when k>n. Weshow that under power-law structure, in expectation we can compresswith diminishing per-symbol redundancy for k growing as large assub-exponential in n.For learni...
The success of machine learning has resulted from its structured representation of data. Similar dat...
In recent years, researchers have realized the difficulties of fitting power-law distributions prope...
Distributed learning of probabilistic models from multiple data repositories with minimum communicat...
We present power low rank ensembles (PLRE), a flexible framework for n-gram language modeling where ...
Standard statistical models of language fail to capture one of the most striking properties of natur...
We consider language modelling (LM) as a multi-label structured prediction task by re-framing traini...
Broad distributions appear frequently in empirical data obtained from natural systems even in seemin...
This paper focuses on a general setup for obtaining sample size lower bounds for learning concept cl...
Estimating distributions over large alphabets is a fundamental machine-learning tenet. Yet no method...
With the increasing availability of large datasets machine learning techniques are becoming an incr...
We present an approximation to the Bayesian hierarchical Pitman-Yor process language model which mai...
Abstract Algorithms for learning to rank can be inefficient when they employ risk functions that use...
This article provides a critical assessment of the Gradual Learning Algorithm (GLA) for probabilisti...
Statistical learning refers to the ability to identify structure in the input based on its statistic...
In this thesis, we discuss two issues in the learning to rank area, choosing effective objective lo...
The success of machine learning has resulted from its structured representation of data. Similar dat...
In recent years, researchers have realized the difficulties of fitting power-law distributions prope...
Distributed learning of probabilistic models from multiple data repositories with minimum communicat...
We present power low rank ensembles (PLRE), a flexible framework for n-gram language modeling where ...
Standard statistical models of language fail to capture one of the most striking properties of natur...
We consider language modelling (LM) as a multi-label structured prediction task by re-framing traini...
Broad distributions appear frequently in empirical data obtained from natural systems even in seemin...
This paper focuses on a general setup for obtaining sample size lower bounds for learning concept cl...
Estimating distributions over large alphabets is a fundamental machine-learning tenet. Yet no method...
With the increasing availability of large datasets machine learning techniques are becoming an incr...
We present an approximation to the Bayesian hierarchical Pitman-Yor process language model which mai...
Abstract Algorithms for learning to rank can be inefficient when they employ risk functions that use...
This article provides a critical assessment of the Gradual Learning Algorithm (GLA) for probabilisti...
Statistical learning refers to the ability to identify structure in the input based on its statistic...
In this thesis, we discuss two issues in the learning to rank area, choosing effective objective lo...
The success of machine learning has resulted from its structured representation of data. Similar dat...
In recent years, researchers have realized the difficulties of fitting power-law distributions prope...
Distributed learning of probabilistic models from multiple data repositories with minimum communicat...