Abstract. In this paper we propose a solution to the similarity measur-ing for heterogenous data. The key idea is to consider the similarity of a given attribute-value pair as the probability of picking randomly a value pair that is less similar than or equally similar in terms of order relations defined appropriately for data types. Similarities of attribute value pairs are then integrated into similarities between data objects using a statis-tical method. Applying our method in combination with distance-based clustering to real data shows the merit of our proposed method. Key words: data mining, similarity measures, heterogenous data, order relations, probability integration.
In data mining, the task-specific performances of conventional distance-based similarity measures va...
Clustering is the process of grouping a set ofphysical or abstract objects into classes of similarob...
Data clustering is a well-known task in data mining and it often relies on distances or, in some cas...
This paper proposes a new measure for similarity between basket datasets. The new measure is calcula...
Cluster analysis or classification usually concerns a set of exploratory multivariate data analysis ...
Copyright © 2009 Polish Academy of Sciences.Cluster analysis or classification usually concerns a se...
International audienceIn many domains, we face heterogeneous data with both numeric and categorical ...
Cluster analysis or classification usually concerns a set of exploratory multivariate data analysis ...
Many machine learning and data mining algorithms crucially rely on the similarity metrics. However, ...
Many machine learning and data mining algorithms crucially rely on the similarity metrics. However, ...
Measuring similarities between objects based on their attributes has been an important problem in ma...
Measuring similarities between objects based on their attributes has been an important problem in ma...
In data mining, similarity or distance between attrib-utes is one of the central notions. Such a not...
In this paper, we consider two applications of distributional similarity measures, probability estim...
Measuring similarities between objects based on their attributes has been an important problem in ma...
In data mining, the task-specific performances of conventional distance-based similarity measures va...
Clustering is the process of grouping a set ofphysical or abstract objects into classes of similarob...
Data clustering is a well-known task in data mining and it often relies on distances or, in some cas...
This paper proposes a new measure for similarity between basket datasets. The new measure is calcula...
Cluster analysis or classification usually concerns a set of exploratory multivariate data analysis ...
Copyright © 2009 Polish Academy of Sciences.Cluster analysis or classification usually concerns a se...
International audienceIn many domains, we face heterogeneous data with both numeric and categorical ...
Cluster analysis or classification usually concerns a set of exploratory multivariate data analysis ...
Many machine learning and data mining algorithms crucially rely on the similarity metrics. However, ...
Many machine learning and data mining algorithms crucially rely on the similarity metrics. However, ...
Measuring similarities between objects based on their attributes has been an important problem in ma...
Measuring similarities between objects based on their attributes has been an important problem in ma...
In data mining, similarity or distance between attrib-utes is one of the central notions. Such a not...
In this paper, we consider two applications of distributional similarity measures, probability estim...
Measuring similarities between objects based on their attributes has been an important problem in ma...
In data mining, the task-specific performances of conventional distance-based similarity measures va...
Clustering is the process of grouping a set ofphysical or abstract objects into classes of similarob...
Data clustering is a well-known task in data mining and it often relies on distances or, in some cas...