While services such as Amazon AWS make computing power abundantly available, adding more computing nodes can incur high costs in, for instance, pay-as-you-go plans while not always significantly improving the net running time (aka wall-clock time) of queries. In this work, we provide algorithms for parallel evaluation of SGF queries in MapReduce that optimize total time, while retaining low net time. Not only can SGF queries specify all semi-join reducers, but also more expressive queries involving disjunction and negation. Since SGF queries can be seen as Boolean combinations of (potentially nested) semi-joins, we introduce a novel multi-semi-join (MSJ) MapReduce operator that enables the evaluation of a set of semi-joins in one job. We us...
Multi-way Theta-join queries are powerful in describing complex relations and therefore widely emplo...
SQL-on-Hadoop systems, query optimization, data distribution over multiple nodes and parallelization...
International audienceMapReduce model is a new parallel programming model initially developed for la...
<p>While services such as Amazon AWS make computing power abundantly available, adding more computin...
[[abstract]]Performance studies show that traditional semi-join processing methods are sometimes ine...
In this paper we present a new framework for studying parallel query optimization. We first note tha...
[[abstract]]The problem of optimal query processing in distributed database systems was shown to be ...
Big data analytics often requires processing complex queries us-ing massive parallelism, where the m...
For over a decade, MapReduce has become the leading programming model for parallel and massive proce...
[[abstract]]The authors identify some optimality properties of a special type of tree queries, namel...
[[abstract]]The properties of optimal semi-join programs for processing distributed tree queries are...
[[abstract]]The authors identify some optimality properties of a special type of tree queries, namel...
: In parallelizing the join operation of database systems, a primary objective is to partition the w...
Join query is one of the most expressive and expensive data analytic tools in traditional database s...
Paper presented to the 3rd Annual Symposium on Graduate Research and Scholarly Projects (GRASP) held...
Multi-way Theta-join queries are powerful in describing complex relations and therefore widely emplo...
SQL-on-Hadoop systems, query optimization, data distribution over multiple nodes and parallelization...
International audienceMapReduce model is a new parallel programming model initially developed for la...
<p>While services such as Amazon AWS make computing power abundantly available, adding more computin...
[[abstract]]Performance studies show that traditional semi-join processing methods are sometimes ine...
In this paper we present a new framework for studying parallel query optimization. We first note tha...
[[abstract]]The problem of optimal query processing in distributed database systems was shown to be ...
Big data analytics often requires processing complex queries us-ing massive parallelism, where the m...
For over a decade, MapReduce has become the leading programming model for parallel and massive proce...
[[abstract]]The authors identify some optimality properties of a special type of tree queries, namel...
[[abstract]]The properties of optimal semi-join programs for processing distributed tree queries are...
[[abstract]]The authors identify some optimality properties of a special type of tree queries, namel...
: In parallelizing the join operation of database systems, a primary objective is to partition the w...
Join query is one of the most expressive and expensive data analytic tools in traditional database s...
Paper presented to the 3rd Annual Symposium on Graduate Research and Scholarly Projects (GRASP) held...
Multi-way Theta-join queries are powerful in describing complex relations and therefore widely emplo...
SQL-on-Hadoop systems, query optimization, data distribution over multiple nodes and parallelization...
International audienceMapReduce model is a new parallel programming model initially developed for la...