During the shuffle stage of the MapReduce framework, a large volume of data may be relocated to the same destination at the same time. This, in turn, may lead to the network hotspot problem. On the other hand, it is always more effective to achieve better data locality by moving the computation closer to the data than the other way around. However, doing this may result in the partitioning skew problem, which is characterized by the unbalanced computational loads between the destinations. Consequently, shuffling algorithms should consider all the following criteria: data locality, partitioning skew, and network hotspot. In order to do so, we introduce MCSA, a Multi-Criteria shuffling algorithm for the MapReduce scheduling stage that rests o...
International audienceMapReduce is emerging as a prominent tool for big data processing. Data locali...
International audienceMapReduce has emerged as a popular programming model in the field of data-inte...
Over the past few decades, there is a multifold increase in the amount of digital data that is being...
In the context of Hadoop, recent studies show that the shuffle operation accounts for as much as a t...
International audienceWhether it is for e-science or business, the amount of data produced every yea...
MapReduce is a parallel computing model in which a large dataset is split into smaller parts and exe...
Hadoop is a standard implementation of MapReduce framework for running data-intensive applications o...
Algorithms for mitigating imbalance of the MapReduce computa-tions are considered in this paper. Map...
ABSTRACT MapReduce is a scalable parallel computing framework for big data processing. It exhibits m...
Session 8: Potpourri (Short Paper)YARN is a popular cluster resource management platform. It does no...
This paper proposes and examines the three in-memory shuffling methods designed to address problems ...
International audienceReducing data transfer in MapReduce's shuffle phase is very important because ...
The healthcare industry has generated large amounts of data, and analyzing these has emerged as an i...
We consider algorithms for sorting and skew equi-join oper-ations for computer clusters. The propose...
International audienceMapReduce has emerged as a leading programming model for data-intensive comput...
International audienceMapReduce is emerging as a prominent tool for big data processing. Data locali...
International audienceMapReduce has emerged as a popular programming model in the field of data-inte...
Over the past few decades, there is a multifold increase in the amount of digital data that is being...
In the context of Hadoop, recent studies show that the shuffle operation accounts for as much as a t...
International audienceWhether it is for e-science or business, the amount of data produced every yea...
MapReduce is a parallel computing model in which a large dataset is split into smaller parts and exe...
Hadoop is a standard implementation of MapReduce framework for running data-intensive applications o...
Algorithms for mitigating imbalance of the MapReduce computa-tions are considered in this paper. Map...
ABSTRACT MapReduce is a scalable parallel computing framework for big data processing. It exhibits m...
Session 8: Potpourri (Short Paper)YARN is a popular cluster resource management platform. It does no...
This paper proposes and examines the three in-memory shuffling methods designed to address problems ...
International audienceReducing data transfer in MapReduce's shuffle phase is very important because ...
The healthcare industry has generated large amounts of data, and analyzing these has emerged as an i...
We consider algorithms for sorting and skew equi-join oper-ations for computer clusters. The propose...
International audienceMapReduce has emerged as a leading programming model for data-intensive comput...
International audienceMapReduce is emerging as a prominent tool for big data processing. Data locali...
International audienceMapReduce has emerged as a popular programming model in the field of data-inte...
Over the past few decades, there is a multifold increase in the amount of digital data that is being...