Aggregates are rife in real life SQL queries. However, in the parallel query processing literature aggregate processing has received surprisingly little attention; furthermore, the way current parallel database systems do aggregate processing is far from optimal in many scenarios. We describe two hashing based algorithms for parallel evaluation of aggregates. A performance analysis via an analytical model and an implementation on the Intel Paragon multi-computer shows that each works well for some aggregation selectivities but poorly for the remaining. Fortunately, where one does poorly the other does well and vice-versa. Thus, the two together cover all possible selectivities. We show how, using sampling, an optimizer can decide which of t...
A number of execution strategies for parallel evaluation of multi-join queries have been proposed in...
Recently, Haas and Hellerstein proposed the hash ripple join algorithm in the context of online aggr...
Big data analytics often requires processing complex queries us-ing massive parallelism, where the m...
To better support decision making, it was proposed to extend SQL to include data cube operations. Co...
Queries containing aggregate functions often combine multiple tables through join operations. This q...
Aggregations help computing summaries of a data set, which are ubiquitous in various big data analyt...
Summarization: An emerging challenge in modern distributed querying is to effi- ciently process mult...
In the current work, we derive a complete approach to optimization and automatic parallelization of ...
Analytical queries virtually always involve aggregation and statistics. SQL offers a wide range of f...
Aggregations are almost always done at the top of operator tree after all selections and joins in a ...
Physical database design is important for query performance in a shared-nothing parallel database sy...
The concept of time-constrained SQL queries was introduced to address the problem of long-running SQ...
A number of execution strategies for parallel evaluation of multi-join queries have been proposed in...
Groupjoins, the combined execution of a join and a subsequent group by, are common in analytical que...
. This paper describes a method for optimizing data communication and control for parallel execution...
A number of execution strategies for parallel evaluation of multi-join queries have been proposed in...
Recently, Haas and Hellerstein proposed the hash ripple join algorithm in the context of online aggr...
Big data analytics often requires processing complex queries us-ing massive parallelism, where the m...
To better support decision making, it was proposed to extend SQL to include data cube operations. Co...
Queries containing aggregate functions often combine multiple tables through join operations. This q...
Aggregations help computing summaries of a data set, which are ubiquitous in various big data analyt...
Summarization: An emerging challenge in modern distributed querying is to effi- ciently process mult...
In the current work, we derive a complete approach to optimization and automatic parallelization of ...
Analytical queries virtually always involve aggregation and statistics. SQL offers a wide range of f...
Aggregations are almost always done at the top of operator tree after all selections and joins in a ...
Physical database design is important for query performance in a shared-nothing parallel database sy...
The concept of time-constrained SQL queries was introduced to address the problem of long-running SQ...
A number of execution strategies for parallel evaluation of multi-join queries have been proposed in...
Groupjoins, the combined execution of a join and a subsequent group by, are common in analytical que...
. This paper describes a method for optimizing data communication and control for parallel execution...
A number of execution strategies for parallel evaluation of multi-join queries have been proposed in...
Recently, Haas and Hellerstein proposed the hash ripple join algorithm in the context of online aggr...
Big data analytics often requires processing complex queries us-ing massive parallelism, where the m...