We define and explore the design space of efficient algorithms to compute ROLLUP aggregates, using the MapReduce pro-gramming paradigm. Using a modeling approach, we ex-plain the non-trivial trade-off that exists between parallelism and communication costs that is inherent to a MapReduce implementation of ROLLUP. Furthermore, we design a new family of algorithms that, through a single parameter, allow to find a “sweet spot ” in the parallelism vs. communication cost trade-off. We complement our work with an experimen-tal approach, wherein we overcome some limitations of the model we use. Our results indicate that efficient ROLLUP aggregates require striking the good balance between paral-lelism and communication for both one-round and chain...
Greedy algorithms are practitioners ’ best friends—they are intu-itive, simple to implement, and oft...
Many big data algorithms executed on MapReduce-like systems have a shuffle phase that often dominate...
Frequent Itemsets and Association Rules Mining (FIM) is a key task in knowledge discovery from data....
We define and explore the design space of efficient algorithms to compute ROLLUP aggregates, using t...
Data summarization queries that compute aggregates by grouping datasets across several dimensions ar...
Recently, several algorithms based on the MapReduce framework have been proposed for frequent patter...
Abstract. MapReduce, being inspired by the map and reduce primi-tives available in many functional l...
MapReduce is a programming model from Google for cluster-based computing in domains such as search e...
Abstract—Modern applications for distributed publish/subscribe systems often require stream aggregat...
Abstract — Statistics about n-grams (i.e., sequences of contigu-ous words or other tokens in text do...
This paper presents modulo unrolling without unrolling (mod-ulo unrolling WU), a method for message ...
Since its introduction in 2004, the MapReduce framework has be-come one of the standard approaches i...
From movie recommendations to fraud detection to personalized health care, there is growing need to ...
This work explores fundamental modeling and algorithmic issues arising in the well-established MapRe...
Graphs are analyzed in many important contexts, including ranking search results based on the hyperl...
Greedy algorithms are practitioners ’ best friends—they are intu-itive, simple to implement, and oft...
Many big data algorithms executed on MapReduce-like systems have a shuffle phase that often dominate...
Frequent Itemsets and Association Rules Mining (FIM) is a key task in knowledge discovery from data....
We define and explore the design space of efficient algorithms to compute ROLLUP aggregates, using t...
Data summarization queries that compute aggregates by grouping datasets across several dimensions ar...
Recently, several algorithms based on the MapReduce framework have been proposed for frequent patter...
Abstract. MapReduce, being inspired by the map and reduce primi-tives available in many functional l...
MapReduce is a programming model from Google for cluster-based computing in domains such as search e...
Abstract—Modern applications for distributed publish/subscribe systems often require stream aggregat...
Abstract — Statistics about n-grams (i.e., sequences of contigu-ous words or other tokens in text do...
This paper presents modulo unrolling without unrolling (mod-ulo unrolling WU), a method for message ...
Since its introduction in 2004, the MapReduce framework has be-come one of the standard approaches i...
From movie recommendations to fraud detection to personalized health care, there is growing need to ...
This work explores fundamental modeling and algorithmic issues arising in the well-established MapRe...
Graphs are analyzed in many important contexts, including ranking search results based on the hyperl...
Greedy algorithms are practitioners ’ best friends—they are intu-itive, simple to implement, and oft...
Many big data algorithms executed on MapReduce-like systems have a shuffle phase that often dominate...
Frequent Itemsets and Association Rules Mining (FIM) is a key task in knowledge discovery from data....