In this work, we present a set of techniques that considerably improve the performance of executing concurrent MapRe-duce jobs. Our proposed solution relies on proper resource allocation for concurrent Hive jobs based on data depen-dency, inter-query optimization and modeling of Hadoop cluster load. To the best of our knowledge, this is the first work towards Hive/MapReduce job optimization which takes Hadoop cluster load into consideration. We perform an experimental study that demonstrates 233% reduction in execution time for concurrent vs sequential ex-ecution schema. We report up to 40 % extra reduction in execution time for concurrent job execution after resource usage optimization. The results reported in this paper were obtained in a...
This study proposes an improvement andimplementation of enhanced Hadoop MapReduce workflow that deve...
As the era of “big data” has arrived, more and more companies start using distributed file systems t...
The underlying assumption behind Hadoop and, more generally, the need for distributed processing is ...
In this work, we present a set of techniques that considerably improve the performance of executing ...
Open AccessHadoop version 1 (HadoopV1) and version 2 (YARN) manage the resources in a distributed sy...
MapReduce is a popular model of executing time-consuming analytical queries as a batch of tasks on l...
Abstract—One of the most widely used frameworks for programming MapReduce-based applications is Apac...
Big data is an emerging concept involving complex data sets which can give new insight and distill n...
In present day scenario cloud has become an inevitable need for majority of IT operational organizat...
The performance of the MapReduce-based Cloud data warehouses mainly depends on the virtual hardware ...
Abstract-—As a core component of Hadoop that is a cloud open platform, MapReduce is a distributed an...
MapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoo...
MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and d...
This research proposes a novel runtime system, Habanero Hadoop, to tackle the inefficient utilizatio...
MapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoo...
This study proposes an improvement andimplementation of enhanced Hadoop MapReduce workflow that deve...
As the era of “big data” has arrived, more and more companies start using distributed file systems t...
The underlying assumption behind Hadoop and, more generally, the need for distributed processing is ...
In this work, we present a set of techniques that considerably improve the performance of executing ...
Open AccessHadoop version 1 (HadoopV1) and version 2 (YARN) manage the resources in a distributed sy...
MapReduce is a popular model of executing time-consuming analytical queries as a batch of tasks on l...
Abstract—One of the most widely used frameworks for programming MapReduce-based applications is Apac...
Big data is an emerging concept involving complex data sets which can give new insight and distill n...
In present day scenario cloud has become an inevitable need for majority of IT operational organizat...
The performance of the MapReduce-based Cloud data warehouses mainly depends on the virtual hardware ...
Abstract-—As a core component of Hadoop that is a cloud open platform, MapReduce is a distributed an...
MapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoo...
MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and d...
This research proposes a novel runtime system, Habanero Hadoop, to tackle the inefficient utilizatio...
MapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoo...
This study proposes an improvement andimplementation of enhanced Hadoop MapReduce workflow that deve...
As the era of “big data” has arrived, more and more companies start using distributed file systems t...
The underlying assumption behind Hadoop and, more generally, the need for distributed processing is ...