International audienceReducing data transfer in MapReduce's shuffle phase is very important because it increases data locality of reduce tasks, and thus decreases the overhead of job executions. In the literature, several optimizations have been proposed to reduce data transfer between mappers and reducers. Nevertheless, all these approaches are limited by how intermediate key-value pairs are distributed over map outputs. In this paper, we address the problem of high data transfers in MapReduce, and propose a technique that repartitions tuples of the input datasets, and thereby optimizes the distribution of key-values over mappers, and increases the data locality in reduce tasks. Our approach captures the relationships between input tuples ...
The healthcare industry has generated large amounts of data, and analyzing these has emerged as an i...
International audienceMany cloud computations process large datasets. Programming paradigms have bee...
Data locality and data skew on the reduce side are two essential issues in MapReduce. Improving data...
In the context of Hadoop, recent studies show that the shuffle operation accounts for as much as a t...
International audienceMapReduce is emerging as a prominent tool for big data processing. Data locali...
The performance of MapReduce greatly depends on its data splitting process which happens before the ...
MapReduce is a well-know framework for distributing data-processingcomputations onto parallel cluste...
Algorithms for mitigating imbalance of the MapReduce computa-tions are considered in this paper. Map...
International audienceAlthough MapReduce has been praised for its high scalability and fault toleran...
Abstract MapReduce is emerging as a prominent tool for big data processing. Data locality is a key f...
International audienceWhether it is for e-science or business, the amount of data produced every yea...
Hadoop is a standard implementation of MapReduce framework for running data-intensive applications o...
MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and d...
MapReduce is an effective tool for parallel data processing. One significant issue in practical MapR...
MapReduce is an effective programming model for large-scale data-intensive computing applications. H...
The healthcare industry has generated large amounts of data, and analyzing these has emerged as an i...
International audienceMany cloud computations process large datasets. Programming paradigms have bee...
Data locality and data skew on the reduce side are two essential issues in MapReduce. Improving data...
In the context of Hadoop, recent studies show that the shuffle operation accounts for as much as a t...
International audienceMapReduce is emerging as a prominent tool for big data processing. Data locali...
The performance of MapReduce greatly depends on its data splitting process which happens before the ...
MapReduce is a well-know framework for distributing data-processingcomputations onto parallel cluste...
Algorithms for mitigating imbalance of the MapReduce computa-tions are considered in this paper. Map...
International audienceAlthough MapReduce has been praised for its high scalability and fault toleran...
Abstract MapReduce is emerging as a prominent tool for big data processing. Data locality is a key f...
International audienceWhether it is for e-science or business, the amount of data produced every yea...
Hadoop is a standard implementation of MapReduce framework for running data-intensive applications o...
MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and d...
MapReduce is an effective tool for parallel data processing. One significant issue in practical MapR...
MapReduce is an effective programming model for large-scale data-intensive computing applications. H...
The healthcare industry has generated large amounts of data, and analyzing these has emerged as an i...
International audienceMany cloud computations process large datasets. Programming paradigms have bee...
Data locality and data skew on the reduce side are two essential issues in MapReduce. Improving data...