In this paper, we address the problem caused by fixed assignment of task slots in Hadoop MapReduce. It is infeasible to manually configure optimal task slots since the characteristics of various workloads are different. We design and implement an automatic control mechanism to dynamically assign task slots based on the resource utilization on each Task Tracker node. The assignment takes the lag period into account. It can improve the cluster-wide resource utilization and avoid contention. Experimental results show that our implementation can dynamically adjust the task slots capacity to the optimal setting in runtime. In some case such as Word Count, our control mechanism outperforms the current Hadoop with optimal task slots configuration ...
Open AccessHadoop version 1 (HadoopV1) and version 2 (YARN) manage the resources in a distributed sy...
International audienceThere is a trade-off between the number of concurrently running MapReduce jobs...
Resource capacity is often over provisioned to primarily deal with short periods of peak load. Shapi...
The MapReduce framework and its open source implementation Hadoop have become the defacto platform f...
Hadoop, an open source implementation of MapReduce, uses slots to represent resource sharing. The nu...
MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and d...
In recent years Google’s MapReduce has emerged as a lead-ing large-scale data processing architectur...
Hadoop is an emerging framework for parallel big data processing. While becoming popular, Hadoop is ...
Nowadays, data-intensive problems are so prevalent that numerous organizations in various industries...
AbstractInspired by the victory of Apache's Hadoop this paper suggests a new reduce task scheduler. ...
The big data is one of the fastest growing technologies, which can to handle huge amounts of data fr...
Abstract—MapReduce is a kind of software framework for easily writing applications which process vas...
Abstract — The specific choice of workload task schedulers for Hadoop MapReduce applications can hav...
AbstractMapReduce is a popular parallel programming model used to solve wide range of BigData applic...
International audienceIn Hadoop cluster, the performance and the resource consumption of MapReduce j...
Open AccessHadoop version 1 (HadoopV1) and version 2 (YARN) manage the resources in a distributed sy...
International audienceThere is a trade-off between the number of concurrently running MapReduce jobs...
Resource capacity is often over provisioned to primarily deal with short periods of peak load. Shapi...
The MapReduce framework and its open source implementation Hadoop have become the defacto platform f...
Hadoop, an open source implementation of MapReduce, uses slots to represent resource sharing. The nu...
MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and d...
In recent years Google’s MapReduce has emerged as a lead-ing large-scale data processing architectur...
Hadoop is an emerging framework for parallel big data processing. While becoming popular, Hadoop is ...
Nowadays, data-intensive problems are so prevalent that numerous organizations in various industries...
AbstractInspired by the victory of Apache's Hadoop this paper suggests a new reduce task scheduler. ...
The big data is one of the fastest growing technologies, which can to handle huge amounts of data fr...
Abstract—MapReduce is a kind of software framework for easily writing applications which process vas...
Abstract — The specific choice of workload task schedulers for Hadoop MapReduce applications can hav...
AbstractMapReduce is a popular parallel programming model used to solve wide range of BigData applic...
International audienceIn Hadoop cluster, the performance and the resource consumption of MapReduce j...
Open AccessHadoop version 1 (HadoopV1) and version 2 (YARN) manage the resources in a distributed sy...
International audienceThere is a trade-off between the number of concurrently running MapReduce jobs...
Resource capacity is often over provisioned to primarily deal with short periods of peak load. Shapi...