Maintaining bernoulli samples over evolving multisets

Gemulla, Rainer
Lehner, Wolfgang
Haas, Peter J.

Open PDF

Open link

Publication date

December 2022

DOI

10.1145/1265530.1265544

Publisher

Association for Computing Machinery (ACM)

Language

English

Abstract

Random sampling has become a crucial component of modern data management systems. Although the literature on database sampling is large, there has been relatively little work on the problem of maintaining a sample in the presence of arbitrary insertions and deletions to the underlying dataset. Most existing maintenance techniques apply either to the insert-only case or to datasets that do not contain duplicates. In this paper, we provide a scheme that maintains a Bernoulli sample of an underlying multiset in the presence of an arbitrary stream of updates, deletions, and insertions. Importantly, the scheme never needs to access the underlying multiset. Such Bernoulli samples are easy to manipulate, and are well suited to parallel processing ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Maintaining bernoulli samples over evolving multisets

Abstract

Extracted data

Maintaining bernoulli samples over evolving multisets

Abstract

Extracted data

Related items

Related items