A fundamental problem in data management is to draw and maintain a sample of a large data set, for approximate query answering, selectivity estimation, and query planning. With large, streaming data sets, this problem becomes particularly difficult when the data is shared across multiple distributed sites. The main challenge is to ensure that a sample is drawn uniformly across the union of the data while minimizing the communication needed to run the protocol on the evolving data. At the same time, it is also necessary to make the protocol lightweight, by keeping the space and time costs low for each participant. In this article, we present communication-efficient protocols for continuously maintaining a sample (both with and without replac...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
The existing random sampling methods have at least one of the following disadvantages: they 1) are a...
We consider weighted random sampling from distributed data streams presented as a sequence of mini-b...
A fundamental problem in data management is to draw and maintain a sample of a large data set, for a...
In this paper we extend the study of algorithms for monitoring distributed data streams from whole d...
We investigate several basic problems in the distributed streaming model. In the this model, we have...
Session 5B - C005The past decade has witnessed many interesting algorithms for maintaining statistic...
We give an improved algorithm for drawing a random sample from a large data stream when the input el...
The past decade has witnessed many interesting algorithms for maintaining statistics over a data str...
\u3cp\u3eWhile traditional data management systems focus on evaluating single, ad hoc queries over s...
While traditional database systems optimize for performance on one-shot query processing, emerging l...
Summarization: While traditional data management systems focus on evaluating single, ad hoc queries ...
Abstract We introduce the problem of sampling from a moving window of recent items from a data strea...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
The existing random sampling methods have at least one of the following disadvantages: they 1) are a...
We consider weighted random sampling from distributed data streams presented as a sequence of mini-b...
A fundamental problem in data management is to draw and maintain a sample of a large data set, for a...
In this paper we extend the study of algorithms for monitoring distributed data streams from whole d...
We investigate several basic problems in the distributed streaming model. In the this model, we have...
Session 5B - C005The past decade has witnessed many interesting algorithms for maintaining statistic...
We give an improved algorithm for drawing a random sample from a large data stream when the input el...
The past decade has witnessed many interesting algorithms for maintaining statistics over a data str...
\u3cp\u3eWhile traditional data management systems focus on evaluating single, ad hoc queries over s...
While traditional database systems optimize for performance on one-shot query processing, emerging l...
Summarization: While traditional data management systems focus on evaluating single, ad hoc queries ...
Abstract We introduce the problem of sampling from a moving window of recent items from a data strea...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
The existing random sampling methods have at least one of the following disadvantages: they 1) are a...
We consider weighted random sampling from distributed data streams presented as a sequence of mini-b...