In this work, we present a set of techniques that considerably improve the performance of executing concurrent MapRe- duce jobs. Our proposed solution relies on proper resource allocation for concurrent Hive jobs based on data depen- dency, inter-query optimization and modeling of Hadoop cluster load. To the best of our knowledge, this is the first work towards Hive/MapReduce job optimization which takes Hadoop cluster load into consideration.\ud We perform an experimental study that demonstrates 233% reduction in execution time for concurrent vs sequential ex- ecution schema. We report up to 40% extra reduction in execution time for concurrent job execution after resource usage optimization.\ud The results reported in this paper were obtai...
Big data is an emerging concept involving complex data sets which can give new insight and distill n...
This research proposes a novel runtime system, Habanero Hadoop, to tackle the inefficient utilizatio...
Hive table is one of the big data tables which relies on structural data. By default, it stores the ...
In this work, we present a set of techniques that considerably improve the performance of executing ...
In this work, we present a set of techniques that considerably improve the performance of executing ...
Open AccessHadoop version 1 (HadoopV1) and version 2 (YARN) manage the resources in a distributed sy...
As the era of “big data” has arrived, more and more companies start using distributed file systems t...
In present day scenario cloud has become an inevitable need for majority of IT operational organizat...
MapReduce is a popular model of executing time-consuming analytical queries as a batch of tasks on l...
Abstract—One of the most widely used frameworks for programming MapReduce-based applications is Apac...
The performance of the MapReduce-based Cloud data warehouses mainly depends on the virtual hardware ...
MapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoo...
MapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoo...
Abstract-—As a core component of Hadoop that is a cloud open platform, MapReduce is a distributed an...
Apache Hadoop has provided solutions to the obstacles related to the Big Data processing. Hadoop sto...
Big data is an emerging concept involving complex data sets which can give new insight and distill n...
This research proposes a novel runtime system, Habanero Hadoop, to tackle the inefficient utilizatio...
Hive table is one of the big data tables which relies on structural data. By default, it stores the ...
In this work, we present a set of techniques that considerably improve the performance of executing ...
In this work, we present a set of techniques that considerably improve the performance of executing ...
Open AccessHadoop version 1 (HadoopV1) and version 2 (YARN) manage the resources in a distributed sy...
As the era of “big data” has arrived, more and more companies start using distributed file systems t...
In present day scenario cloud has become an inevitable need for majority of IT operational organizat...
MapReduce is a popular model of executing time-consuming analytical queries as a batch of tasks on l...
Abstract—One of the most widely used frameworks for programming MapReduce-based applications is Apac...
The performance of the MapReduce-based Cloud data warehouses mainly depends on the virtual hardware ...
MapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoo...
MapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoo...
Abstract-—As a core component of Hadoop that is a cloud open platform, MapReduce is a distributed an...
Apache Hadoop has provided solutions to the obstacles related to the Big Data processing. Hadoop sto...
Big data is an emerging concept involving complex data sets which can give new insight and distill n...
This research proposes a novel runtime system, Habanero Hadoop, to tackle the inefficient utilizatio...
Hive table is one of the big data tables which relies on structural data. By default, it stores the ...