Big data is now being utilized widely and developed rapidly. The researches on big data area is meaningful as it provides all kind of information. Answering aggregation queries are also very important in both research and commercial fields. In this paper we aim to introduce a sampling method to answer aggregation queries on realistic massive data with controlled relative error bound. We used JSON as the experiment material data which makes it different from the related and existed researches. Wikipeida records are stored as big JSON data provides the realistic data environment which makes the results meaningful and trustworthy. We utilize the Wikipedia big JSON file, process data, modify and adapt the sampling algorithm with given relative ...
We study a novel solution to executing aggregation (and specifically COUNT) queries over large-scale...
Data proliferation makes big data analysis a challenging task. One way to address the issue is to ut...
Scalability is a key requirement for any KDD and data mining algorithm, and one of the biggest resea...
A large part of the data on the World Wide Web resides in the deep web. Executing structured, high-l...
One response to the proliferation of large datasets has been to develop ingenious ways to throw reso...
The era of Internet of Things and big data has seen individuals, businesses, and organizations incre...
Aggregation queries are at the core of business intelligence and data analytics. In the big data era...
Aggregate query processing over very large datasets can be slow and prone to error due to dirty (mis...
Peer-to-peer databases are becoming prevalent on the Internet for distribution and sharing of docume...
. We analyze the storage/accuracy trade--off of an adaptive sampling algorithm due to Wegman that ma...
Wikipedia has been built to gather encyclopedic knowledge using a collaborative social process that ...
Big Data are generally huge quantities of digital information accrued automatically and/or merged fr...
One of the biggest research challenges in KDD and Data Mining is to develop methods that scale up w...
In this paper, we address the problem of selectivity estimation in a crowdsourced database. Specific...
The amount of data being generated and stored is growing exponentially, owed in part to the continui...
We study a novel solution to executing aggregation (and specifically COUNT) queries over large-scale...
Data proliferation makes big data analysis a challenging task. One way to address the issue is to ut...
Scalability is a key requirement for any KDD and data mining algorithm, and one of the biggest resea...
A large part of the data on the World Wide Web resides in the deep web. Executing structured, high-l...
One response to the proliferation of large datasets has been to develop ingenious ways to throw reso...
The era of Internet of Things and big data has seen individuals, businesses, and organizations incre...
Aggregation queries are at the core of business intelligence and data analytics. In the big data era...
Aggregate query processing over very large datasets can be slow and prone to error due to dirty (mis...
Peer-to-peer databases are becoming prevalent on the Internet for distribution and sharing of docume...
. We analyze the storage/accuracy trade--off of an adaptive sampling algorithm due to Wegman that ma...
Wikipedia has been built to gather encyclopedic knowledge using a collaborative social process that ...
Big Data are generally huge quantities of digital information accrued automatically and/or merged fr...
One of the biggest research challenges in KDD and Data Mining is to develop methods that scale up w...
In this paper, we address the problem of selectivity estimation in a crowdsourced database. Specific...
The amount of data being generated and stored is growing exponentially, owed in part to the continui...
We study a novel solution to executing aggregation (and specifically COUNT) queries over large-scale...
Data proliferation makes big data analysis a challenging task. One way to address the issue is to ut...
Scalability is a key requirement for any KDD and data mining algorithm, and one of the biggest resea...