Data analysts need to understand the quality of data in the warehouse. This is often done by issuing many Group By queries on the sets of columns of interest. Since the volume of data in these warehouses can be large, and tables in a data warehouse often contain many columns, this analysis typically requires executing a large number of Group By queries, which can be expensive. We show that the performance of today’s database systems for such data analysis is inadequate. We also show that the problem is computationally hard, and develop efficient techniques for solving it. We demonstrate significant speedup over existing approaches on today’s commercial database systems. 1
Cluster computation power provides a promising way to improve response time in large data warehouses...
Abstract: - Data Warehouses are databases used in Business Intelligence systems as a data source to ...
Users and administrators of large-scale infrastructures (e.g., datacenters and PlanetLab) are freque...
Data analysts need to understand the quality of data in the warehouse. This is often done by issuing...
International audienceMapReduce model is a new parallel programming model initially developed for la...
Data analysis applications in areas as diverse as remote sensing and telepathology require operating...
In this paper, we define and examine a particular class of queries called group queries. Group queri...
Some aggregate and grouping queries are conceptually simple, but difficult to express in SQL. This d...
In the current work, we derive a complete approach to optimization and automatic parallelization of ...
This paper presents and evaluates a simple but very effective method to implement large data warehou...
Queries containing aggregate functions often combine multiple tables through join operations. This q...
The skyline operator was first proposed in 2001 for retrieving interesting tuples from a dataset. Si...
The skyline operator was first proposed in 2001 for retrieving interesting tuples from a dataset. Si...
Multi-relational data mining algorithms search a large hypothesis space in order to find a suitable ...
Some recently proposed extensions to relational database systems as well as deductive database syste...
Cluster computation power provides a promising way to improve response time in large data warehouses...
Abstract: - Data Warehouses are databases used in Business Intelligence systems as a data source to ...
Users and administrators of large-scale infrastructures (e.g., datacenters and PlanetLab) are freque...
Data analysts need to understand the quality of data in the warehouse. This is often done by issuing...
International audienceMapReduce model is a new parallel programming model initially developed for la...
Data analysis applications in areas as diverse as remote sensing and telepathology require operating...
In this paper, we define and examine a particular class of queries called group queries. Group queri...
Some aggregate and grouping queries are conceptually simple, but difficult to express in SQL. This d...
In the current work, we derive a complete approach to optimization and automatic parallelization of ...
This paper presents and evaluates a simple but very effective method to implement large data warehou...
Queries containing aggregate functions often combine multiple tables through join operations. This q...
The skyline operator was first proposed in 2001 for retrieving interesting tuples from a dataset. Si...
The skyline operator was first proposed in 2001 for retrieving interesting tuples from a dataset. Si...
Multi-relational data mining algorithms search a large hypothesis space in order to find a suitable ...
Some recently proposed extensions to relational database systems as well as deductive database syste...
Cluster computation power provides a promising way to improve response time in large data warehouses...
Abstract: - Data Warehouses are databases used in Business Intelligence systems as a data source to ...
Users and administrators of large-scale infrastructures (e.g., datacenters and PlanetLab) are freque...