The frequent elements problem involves processing a stream of elements and finding all elements that occur more than a given fraction of the time. A relaxed version of this problem is the -approximate elements problem which allows some false positives. This thesis aims to solve this problem in a parallel context, where multiple threads work together to speed up computation. Previous research has been successful in producing algorithms that can process large streams of data very quickly, however they divide the input stream equally among the threads in the system, which results in excessive memory usage. The algorithm presented in this thesis, the Delegation Space-Saving algorithm, logically assigns ownership of certain elements to certain ...
In the current work, we derive a complete approach to optimization and automatic parallelization of ...
We consider conjunctive queries with arithmetic com-parisons over multiple continuous data streams. ...
Scalable execution of continuous queries over massive data streams often requires splitting input st...
Relational algebra and SQL have been a standard in declarative analytics for decades. Yet, at web-sc...
Numerous applications in for example science, engineering, and financial analysis increasingly requi...
We present a deterministic parallel algorithm for the k-majority problem, that can be used to find i...
We propose an approximate integrated approach for solving both problems of finding the most pop-ular...
Recently, several algorithms based on the MapReduce framework have been proposed for frequent patter...
Sketches are data structures designed to answer approximate queries by trading memory overhead with ...
High-performance analytical data processing systems often run on servers with large amounts of main ...
The frequent items problem is to process a stream of items and find all items occurring more than a ...
While traditional database systems optimize for performance on one-shot query processing, emerging l...
Thesis (Ph.D.)--University of Washington, 2015The need to analyze and understand big data has change...
The exact computation of the number of distinct elements (frequency moment F0) is a fundamental prob...
We consider the problem of maintaining frequency counts for items occurring frequently in the union ...
In the current work, we derive a complete approach to optimization and automatic parallelization of ...
We consider conjunctive queries with arithmetic com-parisons over multiple continuous data streams. ...
Scalable execution of continuous queries over massive data streams often requires splitting input st...
Relational algebra and SQL have been a standard in declarative analytics for decades. Yet, at web-sc...
Numerous applications in for example science, engineering, and financial analysis increasingly requi...
We present a deterministic parallel algorithm for the k-majority problem, that can be used to find i...
We propose an approximate integrated approach for solving both problems of finding the most pop-ular...
Recently, several algorithms based on the MapReduce framework have been proposed for frequent patter...
Sketches are data structures designed to answer approximate queries by trading memory overhead with ...
High-performance analytical data processing systems often run on servers with large amounts of main ...
The frequent items problem is to process a stream of items and find all items occurring more than a ...
While traditional database systems optimize for performance on one-shot query processing, emerging l...
Thesis (Ph.D.)--University of Washington, 2015The need to analyze and understand big data has change...
The exact computation of the number of distinct elements (frequency moment F0) is a fundamental prob...
We consider the problem of maintaining frequency counts for items occurring frequently in the union ...
In the current work, we derive a complete approach to optimization and automatic parallelization of ...
We consider conjunctive queries with arithmetic com-parisons over multiple continuous data streams. ...
Scalable execution of continuous queries over massive data streams often requires splitting input st...