Matching-Based Assignement Strategies for Improving Data Locality of Map Tasks in MapReduce

Beaumont, Olivier
Lambert, Thomas
Marchal, Loris
Thomas, Bastien

Publication date

February 2017

Publisher

HAL CCSD

Abstract

MapReduce is a well-know framework for distributing data-processingcomputations onto parallel clusters. In MapReduce, a large computationis broken into small tasks that run in parallel on multiple machines,and scales easily to very large clusters of inexpensive commoditycomputers. Before the Map phase, the original dataset is split intodata chunks that are replicated (a constant number of times, usually3) and distributed randomly onto computing nodes. During the Mapphase, local tasks (i.e., tasks whose data chunks are stored locally)are assigned in priority when processors request tasks. In thispaper, we provide the first complete theoretical analysis of datalocality in the Map phase of MapReduce, and more generally, forbag-of-tasks appli...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Matching-Based Assignement Strategies for Improving Data Locality of Map Tasks in MapReduce

Abstract

Extracted data

Matching-Based Assignement Strategies for Improving Data Locality of Map Tasks in MapReduce

Abstract

Extracted data

Related items

Related items