Given m distributed data streams A_1,..., A_m, we consider the problem of estimating the number of unique identifiers in streams defined by set expressions over A_1,..., A_m. We identify a broad class of algorithms for solving this problem, and show that the estimators output by any algorithm in this class are perfectly unbiased and satisfy strong variance bounds. Our analysis unifies and generalizes a variety of earlier results in the literature. To demonstrate its generality, we describe several novel sampling algorithms in our class, and show that they achieve a novel tradeoff between accuracy, space usage, update speed, and applicability
There is growing interest in algorithms for processing and querying continuous data streams (i.e., d...
Summarization: There is growing interest in algorithms for processing and querying continuous data s...
Counting the number of distinct elements in a data stream (distinct counting) is a fundamental aggre...
Given [Math Processing Error] distributed data streams [Math Processing Error], we consider the prob...
Estimating the cardinality (i.e. number of distinct elements) of an arbitrary set expression dened o...
International audienceIn this paper, we show that data streams can sometimes usefully be studied as ...
We study the problem of estimating distinct elements in the data stream model, which has a central r...
We consider the problem of estimating set-expression cardinality in a distributed streaming environm...
We consider continuous maintenance of a random sample of distinct elements from a massive data strea...
AbstractIn data streaming applications, data arrives at rapid rates and in high volume, thus making ...
International audienceThe analysis of massive data streams is fundamental in many monitoring applica...
International audienceWe investigate the problem of estimating on the fly the frequency at which ite...
We give an improved algorithm for drawing a random sample from a large data stream when the input el...
We give the first optimal algorithm for estimating the number of distinct elements in a data stream,...
International audienceIn this paper, we consider the setting of large scale distributed systems, in ...
There is growing interest in algorithms for processing and querying continuous data streams (i.e., d...
Summarization: There is growing interest in algorithms for processing and querying continuous data s...
Counting the number of distinct elements in a data stream (distinct counting) is a fundamental aggre...
Given [Math Processing Error] distributed data streams [Math Processing Error], we consider the prob...
Estimating the cardinality (i.e. number of distinct elements) of an arbitrary set expression dened o...
International audienceIn this paper, we show that data streams can sometimes usefully be studied as ...
We study the problem of estimating distinct elements in the data stream model, which has a central r...
We consider the problem of estimating set-expression cardinality in a distributed streaming environm...
We consider continuous maintenance of a random sample of distinct elements from a massive data strea...
AbstractIn data streaming applications, data arrives at rapid rates and in high volume, thus making ...
International audienceThe analysis of massive data streams is fundamental in many monitoring applica...
International audienceWe investigate the problem of estimating on the fly the frequency at which ite...
We give an improved algorithm for drawing a random sample from a large data stream when the input el...
We give the first optimal algorithm for estimating the number of distinct elements in a data stream,...
International audienceIn this paper, we consider the setting of large scale distributed systems, in ...
There is growing interest in algorithms for processing and querying continuous data streams (i.e., d...
Summarization: There is growing interest in algorithms for processing and querying continuous data s...
Counting the number of distinct elements in a data stream (distinct counting) is a fundamental aggre...