G-Enum histograms are a new fast and fully automated method for irregular histogram construction. By framing histogram construction as a density estimation problem and its automation as a model selection task, these histograms leverage the Minimum Description Length principle (MDL) to derive two different model selection criteria. Several proven theoretical results about these criteria give insights about their asymptotic behavior and are used to speed up their optimisation. These insights, combined to a greedy search heuristic, are used to construct histograms in linearithmic time rather than the polynomial time incurred by previous works. The capabilities of the proposed MDL density estimation method are illustrated with reference to othe...
Choosing the bin sizes for a histogram can be surprisingly tricky. If there are too few bins, it is ...
Histograms are among the most popular structures for the succinct summarization of data in a variety...
A natural way to estimate the probability density function of an unknown distribution from the sampl...
International audienceG-Enum histograms are a new fast and fully automated method for irregular hist...
We regard histogram density estimation as a model selection problem. Our approach is based on the in...
We regard histogram density estimation as a model selection problem. Our approach is based on the ...
International audienceWe present in this paper a new fully automated method for irregular histogram...
The minimum description length principle is a general methodology for statistical modeling and infer...
Even for a well-trained statistician the construction of a histogram for a given real-valued data s...
International audienceA new fully automatic procedure for the construction of histograms is proposed...
Unsupervised discretization is a crucial step in many knowledge discovery tasks. The state-of-the-ar...
We propose a fully automatic procedure for the construction of irregular histograms. For a given num...
The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization ...
International audienceGiven an n-sample from some unknown density f on [0,1], it is easy to construc...
Abstract(#br)We present a data-adaptive multivariate histogram estimator of an unknown density f bas...
Choosing the bin sizes for a histogram can be surprisingly tricky. If there are too few bins, it is ...
Histograms are among the most popular structures for the succinct summarization of data in a variety...
A natural way to estimate the probability density function of an unknown distribution from the sampl...
International audienceG-Enum histograms are a new fast and fully automated method for irregular hist...
We regard histogram density estimation as a model selection problem. Our approach is based on the in...
We regard histogram density estimation as a model selection problem. Our approach is based on the ...
International audienceWe present in this paper a new fully automated method for irregular histogram...
The minimum description length principle is a general methodology for statistical modeling and infer...
Even for a well-trained statistician the construction of a histogram for a given real-valued data s...
International audienceA new fully automatic procedure for the construction of histograms is proposed...
Unsupervised discretization is a crucial step in many knowledge discovery tasks. The state-of-the-ar...
We propose a fully automatic procedure for the construction of irregular histograms. For a given num...
The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization ...
International audienceGiven an n-sample from some unknown density f on [0,1], it is easy to construc...
Abstract(#br)We present a data-adaptive multivariate histogram estimator of an unknown density f bas...
Choosing the bin sizes for a histogram can be surprisingly tricky. If there are too few bins, it is ...
Histograms are among the most popular structures for the succinct summarization of data in a variety...
A natural way to estimate the probability density function of an unknown distribution from the sampl...