We define and explore the design space of efficient algorithms to compute ROLLUP aggregates, using the MapReduce programming paradigm. Using a modeling approach, we explain the non-trivial trade-o. that exists between parallelism and communication costs that is inherent to a MapReduce implementation of ROLLUP. Furthermore, we design a new family of algorithms that, through a single parameter, allow to find a sweet spot in the parallelism vs. communication cost trade-o. We complement our work with an experimental approach, wherein we overcome some limitations of the model we use. Our results indicate that efficient ROLLUP aggregates require striking the good balance between parallelism and communication for both one-round and chained algorit...
Data abundance poses the need for powerful and easy-to-use tools that support processing large amoun...
MapReduce is a programming model for data-parallel programs originally intended for data centers. Ma...
Frequent Itemsets and Association Rules Mining (FIM) is a key task in knowledge discovery from data....
We define and explore the design space of efficient algorithms to compute ROLLUP aggregates, using t...
Data summarization queries that compute aggregates by grouping datasets across several dimensions ar...
MapReduce is a data processing approach, where a single machine acts as a master, assigning map/redu...
MapReduce is a programming model from Google forcluster-based computing in domains such as searcheng...
Recently, several algorithms based on the MapReduce framework have been proposed for frequent patter...
The MapReduce framework has firmly established itself as one of the most widely used parallel comput...
From movie recommendations to fraud detection to personalized health care, there is growing need to ...
AbstractRecent innovations in Big Data have enabled major strides forward in our ability to glean im...
Many big data algorithms executed on MapReduce-like systems have a shuffle phase that often dominate...
International audienceData abundance poses the need for powerful and easy-to-use tools that support ...
A common approach in the design of MapReduce algorithms is to minimize the number of rounds. Indeed,...
This is a post-peer-review, pre-copyedit version of an article published in International Conference...
Data abundance poses the need for powerful and easy-to-use tools that support processing large amoun...
MapReduce is a programming model for data-parallel programs originally intended for data centers. Ma...
Frequent Itemsets and Association Rules Mining (FIM) is a key task in knowledge discovery from data....
We define and explore the design space of efficient algorithms to compute ROLLUP aggregates, using t...
Data summarization queries that compute aggregates by grouping datasets across several dimensions ar...
MapReduce is a data processing approach, where a single machine acts as a master, assigning map/redu...
MapReduce is a programming model from Google forcluster-based computing in domains such as searcheng...
Recently, several algorithms based on the MapReduce framework have been proposed for frequent patter...
The MapReduce framework has firmly established itself as one of the most widely used parallel comput...
From movie recommendations to fraud detection to personalized health care, there is growing need to ...
AbstractRecent innovations in Big Data have enabled major strides forward in our ability to glean im...
Many big data algorithms executed on MapReduce-like systems have a shuffle phase that often dominate...
International audienceData abundance poses the need for powerful and easy-to-use tools that support ...
A common approach in the design of MapReduce algorithms is to minimize the number of rounds. Indeed,...
This is a post-peer-review, pre-copyedit version of an article published in International Conference...
Data abundance poses the need for powerful and easy-to-use tools that support processing large amoun...
MapReduce is a programming model for data-parallel programs originally intended for data centers. Ma...
Frequent Itemsets and Association Rules Mining (FIM) is a key task in knowledge discovery from data....