We give the first optimal algorithm for estimating the number of distinct elements in a data stream, closing a long line of theoretical research on this problem begun by Flajolet and Martin in their seminal paper in FOCS 1983. This problem has applications to query optimization, Internet routing, network topology, and data mining. For a stream of indices in {1,...,n}, our algorithm computes a \((1 \pm \epsilon)\)-approximation using an optimal \(O(1/\epsilon^{-2} + log(n))\) bits of space with 2/3 success probability, where \(0<\epsilon<1\) is given. This probability can be amplified by independent repetition. Furthermore, our algorithm processes each stream update in O(1) worst-case time, and can report an estimate at any point midstream i...
This paper develops a new mathematical-statistical approach to analyze a class of Flajolet-Martin al...
Streaming algorithms must process a large quantity of small updates quickly to allow queries about t...
Summarization: There is growing interest in algorithms for processing and querying continuous data s...
The exact computation of the number of distinct elements (frequency moment F0) is a fundamental prob...
Counting the number of distinct elements in a data stream (distinct counting) is a fundamental aggre...
International audienceIn this paper, we show that data streams can sometimes usefully be studied as ...
We study the problem of estimating distinct elements in the data stream model, which has a central r...
We present three algorithms to count the number of distinct elements in a data stream to within a fa...
AbstractIn data streaming applications, data arrives at rapid rates and in high volume, thus making ...
We study the distinct elements and l_p-heavy hitters problems in the sliding window model, where onl...
In dynamic distinct counting, we want to maintain a multi-set S of integers under insertions to answ...
We consider continuous maintenance of a random sample of distinct elements from a massive data strea...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
Given m distributed data streams A_1,..., A_m, we consider the problem of estimating the number of u...
There is growing interest in algorithms for processing and querying continuous data streams (i.e., d...
This paper develops a new mathematical-statistical approach to analyze a class of Flajolet-Martin al...
Streaming algorithms must process a large quantity of small updates quickly to allow queries about t...
Summarization: There is growing interest in algorithms for processing and querying continuous data s...
The exact computation of the number of distinct elements (frequency moment F0) is a fundamental prob...
Counting the number of distinct elements in a data stream (distinct counting) is a fundamental aggre...
International audienceIn this paper, we show that data streams can sometimes usefully be studied as ...
We study the problem of estimating distinct elements in the data stream model, which has a central r...
We present three algorithms to count the number of distinct elements in a data stream to within a fa...
AbstractIn data streaming applications, data arrives at rapid rates and in high volume, thus making ...
We study the distinct elements and l_p-heavy hitters problems in the sliding window model, where onl...
In dynamic distinct counting, we want to maintain a multi-set S of integers under insertions to answ...
We consider continuous maintenance of a random sample of distinct elements from a massive data strea...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
Given m distributed data streams A_1,..., A_m, we consider the problem of estimating the number of u...
There is growing interest in algorithms for processing and querying continuous data streams (i.e., d...
This paper develops a new mathematical-statistical approach to analyze a class of Flajolet-Martin al...
Streaming algorithms must process a large quantity of small updates quickly to allow queries about t...
Summarization: There is growing interest in algorithms for processing and querying continuous data s...