Consistent Subset Sampling∗

Konstantin Kutzkov
Rasmus Pagh

Publication date

June 2015

Abstract

Consistent sampling is a technique for specifying, in small space, a subset S of a potentially large universe U such that the elements in S satisfy a suitably chosen sampling condition. Given a subset I ⊆ U it should be possible to quickly compute I ∩ S, i.e., the elements in I satisfying the sampling condition. Consistent sampling has important applications in similarity estimation, and estimation of the number of distinct items in a data stream. In this paper we generalize consistent sampling to the setting where we are interested in sampling size-k subsets occurring in some set in a collection of sets of bounded size b, where k is a small integer. This can be done by applying standard consistent sampling to the k-subsets of each set, but...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Consistent Subset Sampling∗

Abstract

Extracted data

Consistent Subset Sampling∗

Abstract

Extracted data

Related items

Related items