The efficiency of the otherwise expedient decision tree learning can be impaired in processing data-mining-sized data if superlineartime processing is required in attribute selection. An example of such a technique is optimal multisplitting of numerical attributes. Its efficiency is hit hard even by a single troublesome attribute in the domain. Analysis shows that there is a direct connection between the ratio of the numbers of boundary points and training examples and the maximum goodness score of a numerical attribute. Class distribution information from preprocessing can be applied to obtain tighter bounds for an attribute's relevance in class prediction. These analytical bounds, however, are too loose for practical purposes. We experime...
Dynamic programming has been studied extensively, e.g., in computational geometry and string matchin...
We focus on developing improvements to algorithms that generate decision trees from training data. T...
Classification is a widely used technique in the data mining domain, where scalability and efficienc...
The efficiency of the otherwise expedient decision tree learning can be impaired in processing data-...
Often in supervised learning numerical attributes require special treatment and do not fit the learn...
Abstract. We consider multisplitting of numerical value ranges, a task that is encountered as a disc...
Numerical data poses a problem to symbolic learning methods, since numerical value ranges inherently...
Real life problems handled by machine learning deals with various forms of values in the data set at...
Abstract To date, attribute discretization is typically performed by replacing the original set of c...
We propose a new method for discretization, which uses clustering to determine candidate boundaries....
Data engineering is generally considered to be a central issue in the development of data mining app...
Many learning algorithms make an implicit assumption that all the attributes of the presented data a...
Decision trees are a very general computation model. Here the problem is to identify a Boolean funct...
Abstract. We evaluate the power of decision tables as a hypothesis space for supervised learning alg...
An algorithm for learning decision trees for classification and prediction is described which conver...
Dynamic programming has been studied extensively, e.g., in computational geometry and string matchin...
We focus on developing improvements to algorithms that generate decision trees from training data. T...
Classification is a widely used technique in the data mining domain, where scalability and efficienc...
The efficiency of the otherwise expedient decision tree learning can be impaired in processing data-...
Often in supervised learning numerical attributes require special treatment and do not fit the learn...
Abstract. We consider multisplitting of numerical value ranges, a task that is encountered as a disc...
Numerical data poses a problem to symbolic learning methods, since numerical value ranges inherently...
Real life problems handled by machine learning deals with various forms of values in the data set at...
Abstract To date, attribute discretization is typically performed by replacing the original set of c...
We propose a new method for discretization, which uses clustering to determine candidate boundaries....
Data engineering is generally considered to be a central issue in the development of data mining app...
Many learning algorithms make an implicit assumption that all the attributes of the presented data a...
Decision trees are a very general computation model. Here the problem is to identify a Boolean funct...
Abstract. We evaluate the power of decision tables as a hypothesis space for supervised learning alg...
An algorithm for learning decision trees for classification and prediction is described which conver...
Dynamic programming has been studied extensively, e.g., in computational geometry and string matchin...
We focus on developing improvements to algorithms that generate decision trees from training data. T...
Classification is a widely used technique in the data mining domain, where scalability and efficienc...