The count-min sketch (CMS) is a randomized data structure that provides estimates of tokens’ frequencies in a large data stream using a compressed representation of the data by random hashing. In this paper, we rely on a recent Bayesian nonparametric (BNP) view on the CMS to develop a novel learning-augmented CMS under powerlaw data streams. We assume that tokens in the stream are drawn from an unknown discrete distribution, which is endowed with a normalized inverse Gaussian process (NIGP) prior. Then, using distributional properties of the NIGP, we compute the posterior distribution of a token’s frequency in the stream, given the hashed data, and in turn corresponding BNP estimates. Applications to synthetic and real data show...
International audienceWe investigate the problem of estimating on the fly the frequency at which ite...
A flexible conformal inference method is developed to construct confidence intervals for the frequen...
Learning parameters from voluminous data can be prohibitive in terms of memory and computational req...
© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. Bayesian (machine)...
Max-stable random sketches can be computed efficiently on fast streaming positive data sets by using...
Frequency estimation data structures such as the count-min sketch (CMS) have found numerous applicat...
International audienceMotivation: In many bioinformatics pipelines, k-mer counting is often a requir...
Count-Min Sketch (CMS) and HeavyKeeper (HK) are two realiza tions of a compact frequency estimator (...
We present a novel approach for the problem of frequency estimation in data streams that is based on...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2...
In theory, Bayesian nonparametric (BNP) models are well suited to streaming data sce-narios due to t...
Bayesian methods constitute a popular approach to perform statistical inference and predict phenomen...
This paper presents a methodology for creating streaming, distributed inference algorithms for Bayes...
International audienceConservative Count-Min, a stronger version of the popular Count-Min sketch [Co...
Simulation models of complex dynamics in the natural and social sciences commonly lack a tractable...
International audienceWe investigate the problem of estimating on the fly the frequency at which ite...
A flexible conformal inference method is developed to construct confidence intervals for the frequen...
Learning parameters from voluminous data can be prohibitive in terms of memory and computational req...
© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. Bayesian (machine)...
Max-stable random sketches can be computed efficiently on fast streaming positive data sets by using...
Frequency estimation data structures such as the count-min sketch (CMS) have found numerous applicat...
International audienceMotivation: In many bioinformatics pipelines, k-mer counting is often a requir...
Count-Min Sketch (CMS) and HeavyKeeper (HK) are two realiza tions of a compact frequency estimator (...
We present a novel approach for the problem of frequency estimation in data streams that is based on...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2...
In theory, Bayesian nonparametric (BNP) models are well suited to streaming data sce-narios due to t...
Bayesian methods constitute a popular approach to perform statistical inference and predict phenomen...
This paper presents a methodology for creating streaming, distributed inference algorithms for Bayes...
International audienceConservative Count-Min, a stronger version of the popular Count-Min sketch [Co...
Simulation models of complex dynamics in the natural and social sciences commonly lack a tractable...
International audienceWe investigate the problem of estimating on the fly the frequency at which ite...
A flexible conformal inference method is developed to construct confidence intervals for the frequen...
Learning parameters from voluminous data can be prohibitive in terms of memory and computational req...