In this dissertation, we make progress on certain algorithmic problems broadly over two computational models: the streaming model for large datasets and the distribution testing model for large probability distributions. First we consider the streaming model, where a large sequence of data items arrives one by one. The computer needs to make one pass over this sequence, processing every item quickly, in a limited space. In Chapter 2 motivated by a bioinformatics application, we consider the problem of estimating the number of low-frequency items in a stream, which has received only a limited theoretical work so far. We give an efficient streaming algorithm for this problem and show its complexity is almost optimal. In Chapter 3 we consider ...
International audienceWe investigate the problem of estimating on the fly the frequency at which ite...
International audienceWe investigate the problem of estimating on the fly the frequency at which ite...
We give an improved algorithm for drawing a random sample from a large data stream when the input el...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
Data streams have emerged as a natural computational model for numerous applications of big data pro...
Computing functions over a distributed stream of data is a significant problem with practical applic...
Exact solutions are unattainable for important problems. The calculations are limited by the memory ...
Distribution testing is a crucial area at the interface of statistics and algorithms, where one wish...
The past decade has witnessed many interesting algorithms for maintaining statistics over a data str...
International audienceWe investigate the problem of estimating on the fly the frequency at which ite...
We consider weighted random sampling from distributed data streams presented as a sequence of mini-b...
This electronic version was submitted by the student author. The certified thesis is available in th...
International audienceWe investigate the problem of estimating on the fly the frequency at which ite...
International audienceWe investigate the problem of estimating on the fly the frequency at which ite...
We give an improved algorithm for drawing a random sample from a large data stream when the input el...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
In this dissertation, we make progress on certain algorithmic problems broadly over two computationa...
Data streams have emerged as a natural computational model for numerous applications of big data pro...
Computing functions over a distributed stream of data is a significant problem with practical applic...
Exact solutions are unattainable for important problems. The calculations are limited by the memory ...
Distribution testing is a crucial area at the interface of statistics and algorithms, where one wish...
The past decade has witnessed many interesting algorithms for maintaining statistics over a data str...
International audienceWe investigate the problem of estimating on the fly the frequency at which ite...
We consider weighted random sampling from distributed data streams presented as a sequence of mini-b...
This electronic version was submitted by the student author. The certified thesis is available in th...
International audienceWe investigate the problem of estimating on the fly the frequency at which ite...
International audienceWe investigate the problem of estimating on the fly the frequency at which ite...
We give an improved algorithm for drawing a random sample from a large data stream when the input el...