Adaptive sampling [a1] is a probabilistic algorithm invented by M. Wegman (unpublished) around 1980. It provides an unbiased estimator of the number of distinct elements (the "cardinality " ) of a file (a sequence of data items) of potentially large size that contains unpredictable replications. The algorithm is useful in data-base query optimization and in information retrieval. By standard hashing techniques [a3], [a6] the problem reduces to the following. A sequence of real numbers is given. The sequence has been formed by drawing independently and randomly an unknown number of real numbers from, after which the elements are replicated and permuted in some unknown fashion. The problem is to estimate the cardinality in a...
Every year more and more advanced approaches to cardinality estimation are published, using learned ...
Consistent sampling is a technique for specifying, in small space, a subset S of a potentially large...
This paper analyzes the asymptotic properties of a classical algorithm: the adaptative sampling whic...
. We analyze the storage/accuracy trade--off of an adaptive sampling algorithm due to Wegman that ma...
AbstractAn estimation algorithm for a query is a probabilistic algorithm that computes an approximat...
International audienceThe problem of estimating the number n of distinct keys of a large collection ...
AbstractA sequential sampling algorithm or adaptive sampling algorithm is a sampling algorithm that ...
Distinct random sampling is widely used in many applications due to its ability to answer aggregate ...
Scalability is a key requirement for any KDD and data mining algorithm, and one of the biggest resea...
Adaptive sampling, which select samples sequentially, is known to be more efficient than traditional...
AbstractA new class of algorithms to estimate the cardinality of very large multisets using constant...
Perhaps the most flexible synopsis of a database is a uniform random sample of the data; such sample...
One of the biggest research challenges in KDD and Data Mining is to develop methods that scale up w...
AbstractWe present an adaptive, random sampling algorithm for estimating the size of general queries...
This article considers the problem of cardinality estimation in data stream applications. We present...
Every year more and more advanced approaches to cardinality estimation are published, using learned ...
Consistent sampling is a technique for specifying, in small space, a subset S of a potentially large...
This paper analyzes the asymptotic properties of a classical algorithm: the adaptative sampling whic...
. We analyze the storage/accuracy trade--off of an adaptive sampling algorithm due to Wegman that ma...
AbstractAn estimation algorithm for a query is a probabilistic algorithm that computes an approximat...
International audienceThe problem of estimating the number n of distinct keys of a large collection ...
AbstractA sequential sampling algorithm or adaptive sampling algorithm is a sampling algorithm that ...
Distinct random sampling is widely used in many applications due to its ability to answer aggregate ...
Scalability is a key requirement for any KDD and data mining algorithm, and one of the biggest resea...
Adaptive sampling, which select samples sequentially, is known to be more efficient than traditional...
AbstractA new class of algorithms to estimate the cardinality of very large multisets using constant...
Perhaps the most flexible synopsis of a database is a uniform random sample of the data; such sample...
One of the biggest research challenges in KDD and Data Mining is to develop methods that scale up w...
AbstractWe present an adaptive, random sampling algorithm for estimating the size of general queries...
This article considers the problem of cardinality estimation in data stream applications. We present...
Every year more and more advanced approaches to cardinality estimation are published, using learned ...
Consistent sampling is a technique for specifying, in small space, a subset S of a potentially large...
This paper analyzes the asymptotic properties of a classical algorithm: the adaptative sampling whic...