MapReduce is a popular model of executing time-consuming analytical queries as a batch of tasks on large scale data. During simultaneous execution of multiple queries, many oppor- tunities can arise for sharing scan and/or computation tasks. Executing common tasks only once can reduce the total execution time of all queries remarkably. Therefore, we propose to use Multiple Query Optimization (MQO) techniques to improve the overall performance of Hadoop Hive, an open source SQL-based distributed warehouse system based on MapReduce. Our framework, SharedHive, transforms a set of correlated HiveQL queries into new global queries that can produce the same results in remarkably smaller total execution times. It is ex- perimentally shown that Sha...
Today’s data deluge enables organizations to collect massive data, and analyze it with an ever-incre...
Apache Hadoop is an open source framework that deals with the distributed computing of large dataset...
ABSTRACT Enterprises are adapting large-scale data processing platforms, such as Hadoop, to gain act...
SQL-on-Hadoop systems, query optimization, data distribution over multiple nodes and parallelization...
In this work, we present a set of techniques that considerably improve the performance of executing ...
In this work, we present a set of techniques that considerably improve the performance of executing ...
This research proposes a novel runtime system, Habanero Hadoop, to tackle the inefficient utilizatio...
While services such as Amazon AWS make computing power abundantly available, adding more computing n...
The size of data coming from various has increased rapidly. Within few seconds; terabytes of data is...
This paper research Hive performance optimization mainly from the two aspects of MapReduce schedulin...
The performance of the MapReduce-based Cloud data warehouses mainly depends on the virtual hardware ...
ABSTRACT Apache Hive is a widely used data warehouse system for Apache Hadoop, and has been adopted ...
SQL query processing for analytics over Hadoop data has recently gained significant traction. Among ...
SQL query processing for analytics over Hadoop data has recently gained significant traction. Among ...
As the era of “big data” has arrived, more and more companies start using distributed file systems t...
Today’s data deluge enables organizations to collect massive data, and analyze it with an ever-incre...
Apache Hadoop is an open source framework that deals with the distributed computing of large dataset...
ABSTRACT Enterprises are adapting large-scale data processing platforms, such as Hadoop, to gain act...
SQL-on-Hadoop systems, query optimization, data distribution over multiple nodes and parallelization...
In this work, we present a set of techniques that considerably improve the performance of executing ...
In this work, we present a set of techniques that considerably improve the performance of executing ...
This research proposes a novel runtime system, Habanero Hadoop, to tackle the inefficient utilizatio...
While services such as Amazon AWS make computing power abundantly available, adding more computing n...
The size of data coming from various has increased rapidly. Within few seconds; terabytes of data is...
This paper research Hive performance optimization mainly from the two aspects of MapReduce schedulin...
The performance of the MapReduce-based Cloud data warehouses mainly depends on the virtual hardware ...
ABSTRACT Apache Hive is a widely used data warehouse system for Apache Hadoop, and has been adopted ...
SQL query processing for analytics over Hadoop data has recently gained significant traction. Among ...
SQL query processing for analytics over Hadoop data has recently gained significant traction. Among ...
As the era of “big data” has arrived, more and more companies start using distributed file systems t...
Today’s data deluge enables organizations to collect massive data, and analyze it with an ever-incre...
Apache Hadoop is an open source framework that deals with the distributed computing of large dataset...
ABSTRACT Enterprises are adapting large-scale data processing platforms, such as Hadoop, to gain act...