We consider weighted random sampling from distributed data streams presented as a sequence of mini-batches of items. This is a natural model for distributed streaming computation, and our goal is to showcase its usefulness. We present and analyze a fully distributed, communication-efficient algorithm for weighted reservoir sampling in this model. An experimental evaluation on up to 256 nodes (5120 processors) shows good speedups, while theoretical analysis promises further scaling to much larger machines
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
In this paper we show the power of sampling techniques in designing efficient distributed algorithms...
The past decade has witnessed many interesting algorithms for maintaining statistics over a data str...
We consider message-efficient continuous random sampling from a distributed stream, where the probab...
We give an improved algorithm for drawing a random sample from a large data stream when the input el...
This paper investigates parallel random sampling from a potentially-unending data stream whose eleme...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
Data structures for efficient sampling from a set of weighted items are an important building block ...
Data structures for efficient sampling from a set of weighted items are an important building block ...
A fundamental problem in data management is to draw and maintain a sample of a large data set, for a...
A fundamental problem in data management is to draw and maintain a sample of a large data set, for a...
We consider continuous maintenance of a random sample of distinct elements from a massive data strea...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
In this paper we show the power of sampling techniques in designing efficient distributed algorithms...
The past decade has witnessed many interesting algorithms for maintaining statistics over a data str...
We consider message-efficient continuous random sampling from a distributed stream, where the probab...
We give an improved algorithm for drawing a random sample from a large data stream when the input el...
This paper investigates parallel random sampling from a potentially-unending data stream whose eleme...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
Data structures for efficient sampling from a set of weighted items are an important building block ...
Data structures for efficient sampling from a set of weighted items are an important building block ...
A fundamental problem in data management is to draw and maintain a sample of a large data set, for a...
A fundamental problem in data management is to draw and maintain a sample of a large data set, for a...
We consider continuous maintenance of a random sample of distinct elements from a massive data strea...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
In this paper we show the power of sampling techniques in designing efficient distributed algorithms...
The past decade has witnessed many interesting algorithms for maintaining statistics over a data str...