Improving Mapreduce Performance By Using A New Partitioner In Yarn

Lu, Wei
Chen, Lei
Yuan, Haitao
Xing, Weiwei
Wang, Liqiang

Publication date

January 2017

Publisher

Information Bulletin on Variable Stars (IBVS)

Abstract

Data skew, cluster heterogeneity, and network traffic are three issues that significantly influence the performance of MapReduce applications. However, the Hash-Partitioner in native Hadoop does not consider them. This paper proposes a new partitioner in Yarn (Hadoop 2.6.0), namely, PIY, which adopts an innovative parallel sampling method to achieve the distribution of the intermediate data. Based on this, firstly, PIY mitigates data skew in MapReduce applications. Secondly, PIY considers the heterogeneity of the computing resource to balance the load among Reducers. Thirdly, PIY reduces the network traffic in shuffle phase by trying to retain intermediate data on those nodes who act as both mapper and reducer. Compared with the native Hado...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Improving Mapreduce Performance By Using A New Partitioner In Yarn

Abstract

Extracted data

Improving Mapreduce Performance By Using A New Partitioner In Yarn

Abstract

Extracted data

Related items

Related items