AbstractIn this paper, we import a prefetching mechanism into MapReduce model while retaining compatibility with the native Hadoop. Given a data-intensive application running on a Hadoop MapReduce cluster, our approach estimates the execution time of each task and adaptively preloads an amount of data to the memory before the new task is assigned to the computing node. We implement a predictive schedule and prefetching (PSP) mechanism, which is integrated into the native MapReduce runtime system. We also evaluate performance on a 10-node cluster using two popular benchmarks–grep and wordcount. The PSP mechanism reduces the execution time of grep and wordcount up to 28% with an average of 19%. Moreover, the PSP model increases the overall th...