Data locality and data skew on the reduce side are two essential issues in MapReduce. Improving data locality can decrease network traffic by moving reduce tasks to the nodes where the input data of reduce tasks is located. Data skew will lead to load imbalance among reducer nodes. Partitioning is an important operation of MapReduce because it determines the destinations of map output and could significantly affect the data amount of shuffle. Therefore, an effective partitioner can improve MapReduce performance by increasing data locality and decreasing data skew on the reduce side. Previous studies considering the two essential issues have ignored the fact that for different types of jobs, the priority of data locality and data skew on the...
Abstract MapReduce is emerging as a prominent tool for big data processing. Data locality is a key f...
This research proposes a novel runtime system, Habanero Hadoop, to tackle the inefficient utilizatio...
Abstract—The MapReduce platform has been widely used for large-scale data processing and analysis re...
MapReduce is an effective framework for processing large datasets in parallel over a cluster. Data l...
Data skew, cluster heterogeneity, and network traffic are three issues that significantly influence ...
International audienceMapReduce is emerging as a prominent tool for big data processing. Data locali...
Machine learning algorithms have the advantage of making use of the powerful Hadoop distributed comp...
Over the last ten years MapReduce has emerged as one of the staples of distributed computing both in...
The performance of MapReduce greatly depends on its data splitting process which happens before the ...
The healthcare industry has generated large amounts of data, and analyzing these has emerged as an i...
Many real world areas from different sourcesgenerate the big data with large volume of highvelocity,...
Hadoop is a standard implementation of MapReduce framework for running data-intensive applications o...
Algorithms for mitigating imbalance of the MapReduce computa-tions are considered in this paper. Map...
International audienceReducing data transfer in MapReduce's shuffle phase is very important because ...
International audienceAlthough MapReduce has been praised for its high scalability and fault toleran...
Abstract MapReduce is emerging as a prominent tool for big data processing. Data locality is a key f...
This research proposes a novel runtime system, Habanero Hadoop, to tackle the inefficient utilizatio...
Abstract—The MapReduce platform has been widely used for large-scale data processing and analysis re...
MapReduce is an effective framework for processing large datasets in parallel over a cluster. Data l...
Data skew, cluster heterogeneity, and network traffic are three issues that significantly influence ...
International audienceMapReduce is emerging as a prominent tool for big data processing. Data locali...
Machine learning algorithms have the advantage of making use of the powerful Hadoop distributed comp...
Over the last ten years MapReduce has emerged as one of the staples of distributed computing both in...
The performance of MapReduce greatly depends on its data splitting process which happens before the ...
The healthcare industry has generated large amounts of data, and analyzing these has emerged as an i...
Many real world areas from different sourcesgenerate the big data with large volume of highvelocity,...
Hadoop is a standard implementation of MapReduce framework for running data-intensive applications o...
Algorithms for mitigating imbalance of the MapReduce computa-tions are considered in this paper. Map...
International audienceReducing data transfer in MapReduce's shuffle phase is very important because ...
International audienceAlthough MapReduce has been praised for its high scalability and fault toleran...
Abstract MapReduce is emerging as a prominent tool for big data processing. Data locality is a key f...
This research proposes a novel runtime system, Habanero Hadoop, to tackle the inefficient utilizatio...
Abstract—The MapReduce platform has been widely used for large-scale data processing and analysis re...