The paper proposes a heuristic instance reduction algorithm as an approach to machine learning and knowledge discovery in centralized and distributed databases. The proposed algorithm is based on an original method for a selection of reference instances and creates a reduced training dataset. The reduced training set consisting of selected instances can be used as an input for the machine learning algorithms used for data mining tasks. The algorithm calculates for each instance in the data set the value of its similarity coefficient. Values of the coefficient are used to group instances into clusters. The number of clusters depends on the value of the so called representation level set by the user. Out of each cluster only a limited number ...
Two disadvantages of the standard nearest neighbor algorithm are 1) it must store all the instances ...
The active selection of instances can significantly improve the generalisation performance of a le...
We present a method for automatically clustering similar attribute values in a database system spann...
Instance reduction techniques are data preprocessing methods originally developed to enhance the nea...
Multi-databases mining is an urgent task. This thesis solves 4 key problems in multi-databases minin...
Instance reduction techniques are data preprocessing methods originally developed to enhance the nea...
Unlike the traditional supervised learning, multiple-instance learning (MIL) deals with learning fro...
Over recent decades, database sizes have grown considerably. Larger sizes present new challenges, be...
Storing and using specific instances improves the performance of several supervised learning algorit...
The main focus of my research is to design effective learning techniques for information retrieval a...
In instance-based learning, a training set is given to a classifier for classifying new instances. I...
Machine-learning research is to study and apply the computer modeling of learning processes in their...
This research report contains short overview of most frequently used Machine Learning methods, and i...
Finding an efficient data reduction method for large-scale problems is an imperative task. In this p...
AbstractOver recent decades, database sizes have grown considerably. Larger sizes present new challe...
Two disadvantages of the standard nearest neighbor algorithm are 1) it must store all the instances ...
The active selection of instances can significantly improve the generalisation performance of a le...
We present a method for automatically clustering similar attribute values in a database system spann...
Instance reduction techniques are data preprocessing methods originally developed to enhance the nea...
Multi-databases mining is an urgent task. This thesis solves 4 key problems in multi-databases minin...
Instance reduction techniques are data preprocessing methods originally developed to enhance the nea...
Unlike the traditional supervised learning, multiple-instance learning (MIL) deals with learning fro...
Over recent decades, database sizes have grown considerably. Larger sizes present new challenges, be...
Storing and using specific instances improves the performance of several supervised learning algorit...
The main focus of my research is to design effective learning techniques for information retrieval a...
In instance-based learning, a training set is given to a classifier for classifying new instances. I...
Machine-learning research is to study and apply the computer modeling of learning processes in their...
This research report contains short overview of most frequently used Machine Learning methods, and i...
Finding an efficient data reduction method for large-scale problems is an imperative task. In this p...
AbstractOver recent decades, database sizes have grown considerably. Larger sizes present new challe...
Two disadvantages of the standard nearest neighbor algorithm are 1) it must store all the instances ...
The active selection of instances can significantly improve the generalisation performance of a le...
We present a method for automatically clustering similar attribute values in a database system spann...