Abstract. As the size of the databases increases, machine learning algorithms face more demanding requirements for efficiency and accuracy. Rather than analyzing all the data, an alternative approach is to use only a small subset trying to achieve competitive results with less resources. Progressive sampling follows this approach, starting with a small sample of data and increasing the size progressively until the accuracy (or other performance measure) cannot be improved. Progressive sampling aims for equivalent accuracies obtained with all the data but with significantly less resources. This paper introduces a new progressive sampling algorithm called NSC which: (i) defines an initial data set with a simple to evaluate measure and close t...
One of the biggest research challenges in KDD and Data Mining is to develop methods that scale up w...
Scalability is a key requirement for any KDD and data mining algorithm, and one of the biggest resea...
International audienceWe present PANENE, a progressive algorithm for approximate nearest neighbor in...
Given a large data set and a classification learning algorithm, Progressive Sampling (PS) uses incre...
In data mining, sampling has often been suggested as an effective tool to reduce the size of the dat...
International audienceAlthough large training sets are supposed to improve the performance of learni...
One of the fundamental machine learning tasks is that of predictive classification. Given that organ...
International audienceClustering algorithms become more and more sophisticated to cope with large da...
We present a heuristic meta-learning search method for finding a set of optimized algorithmic parame...
We present a heuristic meta-learning search method for finding a set of optimized algorithmic parame...
Nowadays, the major challenge in machine learning is the ‘Big Data’ challenge. The big data problems...
Data science requires time-consuming iterative manual activities. In particular, activities such as ...
The amount of data being generated and stored is growing exponentially, owed in part to the continui...
As data warehouses grow to the point where one hundred gigabytes is considered small, the computatio...
Adaptive sampling [a1] is a probabilistic algorithm invented by M. Wegman (unpublished) around 1980....
One of the biggest research challenges in KDD and Data Mining is to develop methods that scale up w...
Scalability is a key requirement for any KDD and data mining algorithm, and one of the biggest resea...
International audienceWe present PANENE, a progressive algorithm for approximate nearest neighbor in...
Given a large data set and a classification learning algorithm, Progressive Sampling (PS) uses incre...
In data mining, sampling has often been suggested as an effective tool to reduce the size of the dat...
International audienceAlthough large training sets are supposed to improve the performance of learni...
One of the fundamental machine learning tasks is that of predictive classification. Given that organ...
International audienceClustering algorithms become more and more sophisticated to cope with large da...
We present a heuristic meta-learning search method for finding a set of optimized algorithmic parame...
We present a heuristic meta-learning search method for finding a set of optimized algorithmic parame...
Nowadays, the major challenge in machine learning is the ‘Big Data’ challenge. The big data problems...
Data science requires time-consuming iterative manual activities. In particular, activities such as ...
The amount of data being generated and stored is growing exponentially, owed in part to the continui...
As data warehouses grow to the point where one hundred gigabytes is considered small, the computatio...
Adaptive sampling [a1] is a probabilistic algorithm invented by M. Wegman (unpublished) around 1980....
One of the biggest research challenges in KDD and Data Mining is to develop methods that scale up w...
Scalability is a key requirement for any KDD and data mining algorithm, and one of the biggest resea...
International audienceWe present PANENE, a progressive algorithm for approximate nearest neighbor in...