Random sampling has become a crucial component of modern data management systems. Although the literature on database sampling is large, there has been relatively little work on the problem of maintaining a sample in the presence of arbitrary insertions and deletions to the underlying dataset. Most existing maintenance techniques apply either to the insert-only case or to datasets that do not contain duplicates. In this paper, we provide a scheme that maintains a Bernoulli sample of an underlying multiset in the presence of an arbitrary stream of updates, deletions, and insertions. Importantly, the scheme never needs to access the underlying multiset. Such Bernoulli samples are easy to manipulate, and are well suited to parallel processing ...
. We analyze the storage/accuracy trade--off of an adaptive sampling algorithm due to Wegman that ma...
The existing random sampling methods have at least one of the following disadvantages: they 1) are a...
Random sampling is an appealing approach to build synopses of large data streams because random samp...
Random sampling has become a crucial component of modern data management systems. Although the liter...
Perhaps the most flexible synopsis of a database is a uniform random sample of the data; such sample...
Perhaps the most flexible synopsis of a database is a random sample of the data; such samples are wi...
Perhaps the most flexible synopsis of a database is a uniform random sample of the data; such sample...
A variety of schemes have been proposed in the literature to speed up query processing and analytics...
Abstract. Random sampling is a popular technique for providing fast approximate query answers, espec...
Random sampling is a popular technique for providing fast approximate query answers, especially in d...
Abstract. Random sampling is a popular technique for providing fast approximate query answers, espec...
Random sampling is a well-known technique for approximate processing of large datasets. We introduce...
Abstract. Random sampling is a well-known technique for approximate processing of large datasets. We...
We consider the problem of maintaining a warehouse of sampled data that “shadows ” a full-scale data...
We present a modification of the Durstenfeld-Fisher-Yates random-permutation algorithm for use in sa...
. We analyze the storage/accuracy trade--off of an adaptive sampling algorithm due to Wegman that ma...
The existing random sampling methods have at least one of the following disadvantages: they 1) are a...
Random sampling is an appealing approach to build synopses of large data streams because random samp...
Random sampling has become a crucial component of modern data management systems. Although the liter...
Perhaps the most flexible synopsis of a database is a uniform random sample of the data; such sample...
Perhaps the most flexible synopsis of a database is a random sample of the data; such samples are wi...
Perhaps the most flexible synopsis of a database is a uniform random sample of the data; such sample...
A variety of schemes have been proposed in the literature to speed up query processing and analytics...
Abstract. Random sampling is a popular technique for providing fast approximate query answers, espec...
Random sampling is a popular technique for providing fast approximate query answers, especially in d...
Abstract. Random sampling is a popular technique for providing fast approximate query answers, espec...
Random sampling is a well-known technique for approximate processing of large datasets. We introduce...
Abstract. Random sampling is a well-known technique for approximate processing of large datasets. We...
We consider the problem of maintaining a warehouse of sampled data that “shadows ” a full-scale data...
We present a modification of the Durstenfeld-Fisher-Yates random-permutation algorithm for use in sa...
. We analyze the storage/accuracy trade--off of an adaptive sampling algorithm due to Wegman that ma...
The existing random sampling methods have at least one of the following disadvantages: they 1) are a...
Random sampling is an appealing approach to build synopses of large data streams because random samp...