The MapReduce framework and its open source implementation Hadoop have become the defacto platform for scalable analysis on large data sets in recent years. One of the primary concerns in Hadoop is how to minimize the completion length (i.e., makespan) of a set of MapReduce jobs. The current Hadoop only allows static slot configuration, i.e., fixed numbers of map slots and reduce slots throughout the lifetime of a cluster. However, we found that such a static configuration may lead to low system resource utilizations as well as long completion length. Motivated by this, we propose simple yet effective schemes which use slot ratio between map and reduce tasks as a tunable knob for reducing the makespan of a given set. By leveraging the workl...
Over the last ten years MapReduce has emerged as one of the staples of distributed computing both in...
MapReduce is emerging as an important programming model for large-scale data-parallel applications s...
International audienceBig data parallel frameworks, such as MapReduce or Spark have been praised for...
MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and d...
In this paper, we address the problem caused by fixed assignment of task slots in Hadoop MapReduce. ...
Hadoop, an open source implementation of MapReduce, uses slots to represent resource sharing. The nu...
Hadoop is an emerging framework for parallel big data processing. While becoming popular, Hadoop is ...
Within this paper, we target at one subset of production MapReduce workloads that contain some indep...
International audienceThere is a trade-off between the number of concurrently running MapReduce jobs...
Resource capacity is often over provisioned to primarily deal with short periods of peak load. Shapi...
The big data is one of the fastest growing technologies, which can to handle huge amounts of data fr...
International audienceIn Hadoop cluster, the performance and the resource consumption of MapReduce j...
International audienceAlthough MapReduce has been praised for its high scalability and fault toleran...
Abstract — In this new era of big data even health care needs to be modernized, this includes that t...
Open AccessHadoop version 1 (HadoopV1) and version 2 (YARN) manage the resources in a distributed sy...
Over the last ten years MapReduce has emerged as one of the staples of distributed computing both in...
MapReduce is emerging as an important programming model for large-scale data-parallel applications s...
International audienceBig data parallel frameworks, such as MapReduce or Spark have been praised for...
MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and d...
In this paper, we address the problem caused by fixed assignment of task slots in Hadoop MapReduce. ...
Hadoop, an open source implementation of MapReduce, uses slots to represent resource sharing. The nu...
Hadoop is an emerging framework for parallel big data processing. While becoming popular, Hadoop is ...
Within this paper, we target at one subset of production MapReduce workloads that contain some indep...
International audienceThere is a trade-off between the number of concurrently running MapReduce jobs...
Resource capacity is often over provisioned to primarily deal with short periods of peak load. Shapi...
The big data is one of the fastest growing technologies, which can to handle huge amounts of data fr...
International audienceIn Hadoop cluster, the performance and the resource consumption of MapReduce j...
International audienceAlthough MapReduce has been praised for its high scalability and fault toleran...
Abstract — In this new era of big data even health care needs to be modernized, this includes that t...
Open AccessHadoop version 1 (HadoopV1) and version 2 (YARN) manage the resources in a distributed sy...
Over the last ten years MapReduce has emerged as one of the staples of distributed computing both in...
MapReduce is emerging as an important programming model for large-scale data-parallel applications s...
International audienceBig data parallel frameworks, such as MapReduce or Spark have been praised for...