Abstract. The Universal Similarity Metric (USM) has been demon-strated to give practically useful measures of “similarity ” between se-quence data. Here we have used the USM as an alternative distance metric in a K-Nearest Neighbours (K-NN) learner to allow effective pat-tern recognition of variable length sequence data. We compare this USM approach with the commonly used string-to-word vector approach. Our experiments have used two data sets of divergent domains: (1) spam e-mail filtering and (2) protein subcellular localisation. Our results with this data reveal that the USM based K-NN learner (1) gives predictions with higher classification accuracy than those output by techniques that use the string to word vector approach, and (2) can ...
Many machine learning tasks require similarity functions that estimate likeness between observations...
Comparing protein structures based on their contact maps is an important problem in structural prote...
Abstract Background Sequence similarity networks are useful for classifying and characterizing biolo...
Efficient and expressive comparison of sequences is an essential procedure for learning with se-quen...
International audienceSimilarity between objects plays an important role in both human cognitive pro...
We discuss several approaches to similarity preserving coding of symbol sequences and possible conne...
We analyze an approach to a similarity preserving coding of symbol sequences based on neural distrib...
String kernel-based machine learning methods have yielded great success in practical tasks of struct...
Background:Similarity of sequences is a key mathematical notion for Classification and Phylogenetic ...
Abstract—In many applications, it is necessary to determine the similarity of two strings. A widely-...
BACKGROUND:The sequencing of the human genome has enabled us to access a comprehensive list of genes...
Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in ...
irements. A simple and computationally very effective "distance" measure for sequences is ...
This paper presents two metrics for the Nearest Neighbor Classifier that share the property of being...
The documents similarity metric is a substantial tool applied in areas such as determining topic in ...
Many machine learning tasks require similarity functions that estimate likeness between observations...
Comparing protein structures based on their contact maps is an important problem in structural prote...
Abstract Background Sequence similarity networks are useful for classifying and characterizing biolo...
Efficient and expressive comparison of sequences is an essential procedure for learning with se-quen...
International audienceSimilarity between objects plays an important role in both human cognitive pro...
We discuss several approaches to similarity preserving coding of symbol sequences and possible conne...
We analyze an approach to a similarity preserving coding of symbol sequences based on neural distrib...
String kernel-based machine learning methods have yielded great success in practical tasks of struct...
Background:Similarity of sequences is a key mathematical notion for Classification and Phylogenetic ...
Abstract—In many applications, it is necessary to determine the similarity of two strings. A widely-...
BACKGROUND:The sequencing of the human genome has enabled us to access a comprehensive list of genes...
Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in ...
irements. A simple and computationally very effective "distance" measure for sequences is ...
This paper presents two metrics for the Nearest Neighbor Classifier that share the property of being...
The documents similarity metric is a substantial tool applied in areas such as determining topic in ...
Many machine learning tasks require similarity functions that estimate likeness between observations...
Comparing protein structures based on their contact maps is an important problem in structural prote...
Abstract Background Sequence similarity networks are useful for classifying and characterizing biolo...