A flexible conformal inference method is developed to construct confidence intervals for the frequencies of queried objects in very large data sets, based on a much smaller sketch of those data. The approach is data-adaptive and requires no knowledge of the data distribution or of the details of the sketching algorithm; instead, it constructs provably valid frequentist confidence intervals under the sole assumption of data exchangeability. Although our solution is broadly applicable, this paper focuses on applications involving the count-min sketch algorithm and a non-linear variation thereof. The performance is compared to that of frequentist and Bayesian alternatives through simulations and experiments with data sets of SARS-CoV-2 DNA seq...
Max-stable random sketches can be computed efficiently on fast streaming positive data sets by using...
Data sketching is a critical tool for distinct counting, enabling multisets to be represented by com...
We present a novel approach for the problem of frequency estimation in data streams that is based on...
A flexible method is developed to construct a confidence interval for the frequency of a queried obj...
Conformal prediction is an assumption-lean approach to generating distribution-free prediction inter...
Frequency estimation data structures such as the count-min sketch (CMS) have found numerous applicat...
Count-Min Sketch (CMS) and HeavyKeeper (HK) are two realiza tions of a compact frequency estimator (...
When one observes a sequence of variables $(x_1, y_1), \ldots, (x_n, y_n)$, Conformal Prediction (CP...
International audienceConservative Count-Min, a stronger version of the popular Count-Min sketch [Co...
A sketch is a probabilistic data structure that is used to record frequencies of items in a multi-se...
16 pagesIn this paper, we investigate the problem of estimating the number of times data items that ...
International audienceWith local differential privacy (LDP), users can privatize their data and...
We extend conformal prediction to control the expected value of any monotone loss function. The algo...
We study the problem of uncertainty quantification for time series prediction, with the goal of prov...
There are many types of statistical inferences that can be used today: Frequentist, Bayesian, Fiduci...
Max-stable random sketches can be computed efficiently on fast streaming positive data sets by using...
Data sketching is a critical tool for distinct counting, enabling multisets to be represented by com...
We present a novel approach for the problem of frequency estimation in data streams that is based on...
A flexible method is developed to construct a confidence interval for the frequency of a queried obj...
Conformal prediction is an assumption-lean approach to generating distribution-free prediction inter...
Frequency estimation data structures such as the count-min sketch (CMS) have found numerous applicat...
Count-Min Sketch (CMS) and HeavyKeeper (HK) are two realiza tions of a compact frequency estimator (...
When one observes a sequence of variables $(x_1, y_1), \ldots, (x_n, y_n)$, Conformal Prediction (CP...
International audienceConservative Count-Min, a stronger version of the popular Count-Min sketch [Co...
A sketch is a probabilistic data structure that is used to record frequencies of items in a multi-se...
16 pagesIn this paper, we investigate the problem of estimating the number of times data items that ...
International audienceWith local differential privacy (LDP), users can privatize their data and...
We extend conformal prediction to control the expected value of any monotone loss function. The algo...
We study the problem of uncertainty quantification for time series prediction, with the goal of prov...
There are many types of statistical inferences that can be used today: Frequentist, Bayesian, Fiduci...
Max-stable random sketches can be computed efficiently on fast streaming positive data sets by using...
Data sketching is a critical tool for distinct counting, enabling multisets to be represented by com...
We present a novel approach for the problem of frequency estimation in data streams that is based on...