We describe several families of k-mer based string kernels related to the recently presented mismatch kernel and designed for use with support vector machines (SVMs) for classification of protein sequence data. These new kernels – restricted gappy kernels, substitution kernels, and wildcard kernels – are based on feature spaces indexed by k-length subsequences (“k-mers”) from the string alphabet Σ. However, for all kernels we define here, the kernel value K(x,y) can be computed in O(cK(|x|+|y|)) time, where the constant cK depends on the parameters of the kernel but is independent of the size |Σ | of the alphabet. Thus the computation of these kernels is linear in the length of the sequences, like the mismatch kernel, but we improve upon th...
Problems of analysis and modeling of sequential data arise in many practical applications. In this w...
Remote homology detection between protein sequences is a central problem in computational biology. D...
Analysis of large-scale sequential data has become an important task in machine learning and pattern...
We introduce a class of string kernels, called mismatch kernels, for use with support vector machine...
We introduce a class of string kernels, called mismatch kernels, for use with support vector machine...
We introduce a class of string kernels, called mismatch kernels, for use with support vector machin...
Motivation: Classification of proteins sequences into functional and structural families based on se...
Motivation Classification of proteins sequences into functional and structural families based on seq...
Determining protein sequence similarity is an important task for protein classification and homology...
We present a new family of linear time algorithms for string comparison with mismatches under the st...
We present a new family of linear time algorithms for string comparison with mismatches under the st...
We present a new family of linear time algorithms for string comparison with mismatches under the st...
Biological sequence classification (such as protein remote homology detection) solely based on seque...
Kernel-based machine learning algorithms are versatile tools for biological sequence data analysis. ...
International audienceMOTIVATION: Remote homology detection between protein sequences is a central p...
Problems of analysis and modeling of sequential data arise in many practical applications. In this w...
Remote homology detection between protein sequences is a central problem in computational biology. D...
Analysis of large-scale sequential data has become an important task in machine learning and pattern...
We introduce a class of string kernels, called mismatch kernels, for use with support vector machine...
We introduce a class of string kernels, called mismatch kernels, for use with support vector machine...
We introduce a class of string kernels, called mismatch kernels, for use with support vector machin...
Motivation: Classification of proteins sequences into functional and structural families based on se...
Motivation Classification of proteins sequences into functional and structural families based on seq...
Determining protein sequence similarity is an important task for protein classification and homology...
We present a new family of linear time algorithms for string comparison with mismatches under the st...
We present a new family of linear time algorithms for string comparison with mismatches under the st...
We present a new family of linear time algorithms for string comparison with mismatches under the st...
Biological sequence classification (such as protein remote homology detection) solely based on seque...
Kernel-based machine learning algorithms are versatile tools for biological sequence data analysis. ...
International audienceMOTIVATION: Remote homology detection between protein sequences is a central p...
Problems of analysis and modeling of sequential data arise in many practical applications. In this w...
Remote homology detection between protein sequences is a central problem in computational biology. D...
Analysis of large-scale sequential data has become an important task in machine learning and pattern...