In this paper, we investigate techniques to effectively orchestrate HDFS in-memory caching for Hadoop. We first evaluate a degree of benefit which each of various MapReduce applications can get from in-memory caching, i.e. cache affinity. We then propose an adaptive cache local scheduling algorithm that adaptively adjusts the waiting time of a MapReduce job in a queue for a cache local node. We set the waiting time to be proportional to the percentage of cached input data for the job. We also develop a cache affinity cache replacement algorithm that determines which block is cached and evicted based on the cache affinity of applications. Using various workloads consisting of multiple MapReduce applications, we conduct experimental study to ...
Hadoop is a framework for storing and processing huge volumes of data on clusters. It uses Hadoop Di...
Part 2: Parallel and Multi-Core TechnologiesInternational audienceAs a widely used programming model...
A slightly revised version of this work is published in the Proceedings of the 24th IEEE Internation...
Many analytic applications built on Hadoop ecosystem have a propensity to iteratively perform repeti...
Data is being generated at an enormous rate, due to online activities and use of resources related t...
The demand for highly parallel data processing platform was growing due to an explosion in the numbe...
Abstract—The MapReduce platform has been widely used for large-scale data processing and analysis re...
The Big-data refers to the huge scale distributed data processing applications that operate on unusu...
Abstract The buzz-word big-data refers to the large-scale distributed data processing applications t...
Running big data analytics frameworks in the cloud is becoming increasingly important, but their res...
Abstract: The buzz-word big-data refers to the large-scale distributed data processing applications ...
The Hadoop Distributed File System (HDFS) is a network file system used to support multiple widely-u...
MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and d...
This research proposes a novel runtime system, Habanero Hadoop, to tackle the inefficient utilizatio...
The use of computational platforms such as Hadoop and Spark is growing rapidly as a successful parad...
Hadoop is a framework for storing and processing huge volumes of data on clusters. It uses Hadoop Di...
Part 2: Parallel and Multi-Core TechnologiesInternational audienceAs a widely used programming model...
A slightly revised version of this work is published in the Proceedings of the 24th IEEE Internation...
Many analytic applications built on Hadoop ecosystem have a propensity to iteratively perform repeti...
Data is being generated at an enormous rate, due to online activities and use of resources related t...
The demand for highly parallel data processing platform was growing due to an explosion in the numbe...
Abstract—The MapReduce platform has been widely used for large-scale data processing and analysis re...
The Big-data refers to the huge scale distributed data processing applications that operate on unusu...
Abstract The buzz-word big-data refers to the large-scale distributed data processing applications t...
Running big data analytics frameworks in the cloud is becoming increasingly important, but their res...
Abstract: The buzz-word big-data refers to the large-scale distributed data processing applications ...
The Hadoop Distributed File System (HDFS) is a network file system used to support multiple widely-u...
MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and d...
This research proposes a novel runtime system, Habanero Hadoop, to tackle the inefficient utilizatio...
The use of computational platforms such as Hadoop and Spark is growing rapidly as a successful parad...
Hadoop is a framework for storing and processing huge volumes of data on clusters. It uses Hadoop Di...
Part 2: Parallel and Multi-Core TechnologiesInternational audienceAs a widely used programming model...
A slightly revised version of this work is published in the Proceedings of the 24th IEEE Internation...