One of the biggest research challenges in KDD and Data Mining is to develop methods that scale up well to large amounts of data. A possible approach for achieving scalability is to take a random sample and do data mining on it. In this paper, we propose an adaptive sampling method to solve a variety of practically appearing data mining tasks on very large data. Our algorithms are adaptive in the sense that they determine from the data whether it has already seen enough data to reach a reliable conclusion. We prove the correctness of our method, estimate its efficiency theoretically, and show its efficienty experimentally on a concrete task requiring sampling.Preprin
Adaptive sampling [a1] is a probabilistic algorithm invented by M. Wegman (unpublished) around 1980....
Big data processing is the new challenge for analytical, machine learning techniques. Many efforts a...
AbstractA sequential sampling algorithm or adaptive sampling algorithm is a sampling algorithm that ...
One of the biggest research challenges in KDD and Data Mining is to develop methods that scale up w...
Scalability is a key requirement for any KDD and data mining algorithm, and one of the biggest resea...
Abstract- Complex search and data analysis can be automated using Knowledge Discovery. Extracting hi...
The amount of data being generated and stored is growing exponentially, owed in part to the continui...
As data warehouses grow to the point where one hundred gigabytes is considered small, the computatio...
Sequential sampling algorithms have recently attracted interest as a way to design scalable algorith...
Sequential sampling algorithms have recently attracted interest as a way to design scalable algorith...
Knowledge discovery in databases and data mining aim at semiautomatic tools for the analysis of larg...
. We analyze the storage/accuracy trade--off of an adaptive sampling algorithm due to Wegman that ma...
The era of Internet of Things and big data has seen individuals, businesses, and organizations incre...
One of the fundamental machine learning tasks is that of predictive classification. Given that organ...
In this paper we propose a scaling-up method that is applicable to essentially any induction algorit...
Adaptive sampling [a1] is a probabilistic algorithm invented by M. Wegman (unpublished) around 1980....
Big data processing is the new challenge for analytical, machine learning techniques. Many efforts a...
AbstractA sequential sampling algorithm or adaptive sampling algorithm is a sampling algorithm that ...
One of the biggest research challenges in KDD and Data Mining is to develop methods that scale up w...
Scalability is a key requirement for any KDD and data mining algorithm, and one of the biggest resea...
Abstract- Complex search and data analysis can be automated using Knowledge Discovery. Extracting hi...
The amount of data being generated and stored is growing exponentially, owed in part to the continui...
As data warehouses grow to the point where one hundred gigabytes is considered small, the computatio...
Sequential sampling algorithms have recently attracted interest as a way to design scalable algorith...
Sequential sampling algorithms have recently attracted interest as a way to design scalable algorith...
Knowledge discovery in databases and data mining aim at semiautomatic tools for the analysis of larg...
. We analyze the storage/accuracy trade--off of an adaptive sampling algorithm due to Wegman that ma...
The era of Internet of Things and big data has seen individuals, businesses, and organizations incre...
One of the fundamental machine learning tasks is that of predictive classification. Given that organ...
In this paper we propose a scaling-up method that is applicable to essentially any induction algorit...
Adaptive sampling [a1] is a probabilistic algorithm invented by M. Wegman (unpublished) around 1980....
Big data processing is the new challenge for analytical, machine learning techniques. Many efforts a...
AbstractA sequential sampling algorithm or adaptive sampling algorithm is a sampling algorithm that ...