MapReduce (MR) has been widely used to process distributed large data sets. Meanwhile, speculative execution is known as an approach for dealing with same problems by backing up those tasks running on a low performance machine to a higher one. In this paper, we have modified some pitfalls and taken heterogeneous environment into consideration. We also have implemented it in Hadoop-2.6 based on node classification, this strategy is called Speculation-NC and optimized Hadoop is called Hadoop-NC. Experiment results show that our method can correctly backup a task, improve the performance of MRV2 and decrease the execution time and resource consumption compared with traditional strategy
International audienceEnergy consumption is an important concern for large-scale data-centers, which...
Recently, virtualization has become more and more important in the cloud computing to support effici...
Apache Spark is an open-source in-memory cluster-computing framework. Spark decomposes an applicatio...
MapReduce is a popular programming model for the purposes of processing large data sets. Speculative...
MapReduce is currently a parallel computingframework for distributed processing of large-scaledata i...
MapReduce (MRV1), a popular programming model, proposed by Google, has been well used to process lar...
MapReduce is a widely used parallel computing framework for large scale data processing. The two maj...
Hadoop is a famous parallel computing framework that is applied to process large-scale data, but the...
Task stragglers dramatically impede parallel job execution of data-intensive computing in Cloud Data...
The big data is one of the fastest growing technologies, which can to handle huge amounts of data fr...
International audienceHadoop emerged as an important system for large- scale data analysis. Speculat...
Hadoop is an open source from Apache with a distributed file system and MapReduce distributed comput...
Hadoop is a well-known parallel computing system for distributed computing and large-scale data proc...
Hadoop is a well-known parallel computing system for distributed computing and large-scale data proc...
International audienceEnergy consumption is an important concern for large-scale data-centers, which...
Recently, virtualization has become more and more important in the cloud computing to support effici...
Apache Spark is an open-source in-memory cluster-computing framework. Spark decomposes an applicatio...
MapReduce is a popular programming model for the purposes of processing large data sets. Speculative...
MapReduce is currently a parallel computingframework for distributed processing of large-scaledata i...
MapReduce (MRV1), a popular programming model, proposed by Google, has been well used to process lar...
MapReduce is a widely used parallel computing framework for large scale data processing. The two maj...
Hadoop is a famous parallel computing framework that is applied to process large-scale data, but the...
Task stragglers dramatically impede parallel job execution of data-intensive computing in Cloud Data...
The big data is one of the fastest growing technologies, which can to handle huge amounts of data fr...
International audienceHadoop emerged as an important system for large- scale data analysis. Speculat...
Hadoop is an open source from Apache with a distributed file system and MapReduce distributed comput...
Hadoop is a well-known parallel computing system for distributed computing and large-scale data proc...
Hadoop is a well-known parallel computing system for distributed computing and large-scale data proc...
International audienceEnergy consumption is an important concern for large-scale data-centers, which...
Recently, virtualization has become more and more important in the cloud computing to support effici...
Apache Spark is an open-source in-memory cluster-computing framework. Spark decomposes an applicatio...