LIBRA: Lightweight Data Skew Mitigation in MapReduce

Chen, Qi
Yao, Jinyu
Xiao, Zhen

Open link

Publication date

January 2015

DOI

10.1109/TPDS.2014.2350972

Publisher

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

ISSN

1045-9219

Journal

1558-2183

Abstract

MapReduce is an effective tool for parallel data processing. One significant issue in practical MapReduce applications is data skew: the imbalance in the amount of data assigned to each task. This causes some tasks to take much longer to finish than others and can significantly impact performance. This paper presents LIBRA, a lightweight strategy to address the data skew problemamong the reducers of MapReduce applications. Unlike previous work, LIBRA does not require any pre-run sampling of the input data or prevent the overlap between the map and the reduce stages. It uses an innovative sampling method which can achieve a highly accurate approximation to the distribution of the intermediate data by sampling only a small fraction of the int...

Extracted data

We use cookies to provide a better user experience.

Data Protection

LIBRA: Lightweight Data Skew Mitigation in MapReduce

Abstract

Extracted data

LIBRA: Lightweight Data Skew Mitigation in MapReduce

Abstract

Extracted data

Related items

Related items