Aggregate query processing over very large datasets can be slow and prone to error due to dirty (missing, erroneous, du-plicated, or corrupted) values. To address the speed issue, there has lately been a resurgence of interest in sampling-based approximate query processing, but this approach fur-ther reduces answer quality by introducing sampling error. In this paper, we explore an intriguing opportunity that sampling presents, namely, that when integrated with data cleaning, sampling actually improves answer quality. Data cleaning requires either domain-specific software (which can be costly and time-consuming to develop) or human inspec-tion. The latter is increasingly feasible with crowdsourcing but can be highly inefficient for large da...
Big data is now being utilized widely and developed rapidly. The researches on big data area is mean...
Modern data analytics applications typically process massive amounts of data on clusters of tens, hu...
In this paper, we present BlinkDB, a massively parallel, sampling-based approximate query engine for...
In emerging Big Data scenarios, obtaining timely, high-quality answers to aggregate queries is diffi...
An important obstacle to accurate data analytics is dirty data in the form of missing, duplicate, in...
The information managed in emerging applications, such as location-based service, sensor network, an...
A large part of the data on the World Wide Web resides in the deep web. Executing structured, high-l...
Organizations collect a substantial amount of user' data from multiple sources to explore such data ...
One response to the proliferation of large datasets has been to develop ingenious ways to throw reso...
Materialized views (MVs), stored pre-computed results, are widely used to facilitate fast queries on...
Uncertain or imprecise data are pervasive in applications like location-based services, sensor monit...
Incomplete data is ubiquitous. When a user issues a query over incomplete data, the results may cont...
In decision support applications, the ability to provide fast approximate answers to aggregation que...
Decision support queries usually involve accessing enormous amount of data requiring significant ret...
In many emerging applications, such as sensor networks, location-based services, and data integrati...
Big data is now being utilized widely and developed rapidly. The researches on big data area is mean...
Modern data analytics applications typically process massive amounts of data on clusters of tens, hu...
In this paper, we present BlinkDB, a massively parallel, sampling-based approximate query engine for...
In emerging Big Data scenarios, obtaining timely, high-quality answers to aggregate queries is diffi...
An important obstacle to accurate data analytics is dirty data in the form of missing, duplicate, in...
The information managed in emerging applications, such as location-based service, sensor network, an...
A large part of the data on the World Wide Web resides in the deep web. Executing structured, high-l...
Organizations collect a substantial amount of user' data from multiple sources to explore such data ...
One response to the proliferation of large datasets has been to develop ingenious ways to throw reso...
Materialized views (MVs), stored pre-computed results, are widely used to facilitate fast queries on...
Uncertain or imprecise data are pervasive in applications like location-based services, sensor monit...
Incomplete data is ubiquitous. When a user issues a query over incomplete data, the results may cont...
In decision support applications, the ability to provide fast approximate answers to aggregation que...
Decision support queries usually involve accessing enormous amount of data requiring significant ret...
In many emerging applications, such as sensor networks, location-based services, and data integrati...
Big data is now being utilized widely and developed rapidly. The researches on big data area is mean...
Modern data analytics applications typically process massive amounts of data on clusters of tens, hu...
In this paper, we present BlinkDB, a massively parallel, sampling-based approximate query engine for...