Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative) approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k) predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. Whil...
This thesis describes an approach to data-driven discovery of decision trees or rules for assigning ...
Summary. Protein function prediction, i.e. classification of protein sequences according to their bi...
Background: Many open problems in bioinformatics involve elucidating underlying functional signals i...
Most existing methods for sequence-based classification use exhaustive feature generation, employing...
Discrete motifs that discriminate functional classes of proteins are useful for classifying new sequ...
Background: In protein sequence classification, identification of the sequence motifs or n-grams tha...
International audienceFeature extraction is an unavoidable task, especially in the critical step of ...
Doctor of PhilosophyDepartment of Computing and Information SciencesDoina CarageaRecent advancements...
Abstract—Most of existing sequence mining algorithms focuses on mining for subsequences. A large cla...
Abstract Background Motif discovery aims to detect short, highly conserved patterns in a collection ...
Motivation: Motif identification for sequences has many important applications in biological studies...
DNA sequence decomposition into k-mers (substrings of length k) and their frequency counting, define...
We use methods from Data Mining and Knowledge Discovery to design an algorithm for detecting motifs ...
Short paperInternational audienceThis paper addresses the discovery of discriminative nary motifs in...
www.cs.iastate.edu/~honavar/aigroup.html This paper describes an approach to data-driven discovery o...
This thesis describes an approach to data-driven discovery of decision trees or rules for assigning ...
Summary. Protein function prediction, i.e. classification of protein sequences according to their bi...
Background: Many open problems in bioinformatics involve elucidating underlying functional signals i...
Most existing methods for sequence-based classification use exhaustive feature generation, employing...
Discrete motifs that discriminate functional classes of proteins are useful for classifying new sequ...
Background: In protein sequence classification, identification of the sequence motifs or n-grams tha...
International audienceFeature extraction is an unavoidable task, especially in the critical step of ...
Doctor of PhilosophyDepartment of Computing and Information SciencesDoina CarageaRecent advancements...
Abstract—Most of existing sequence mining algorithms focuses on mining for subsequences. A large cla...
Abstract Background Motif discovery aims to detect short, highly conserved patterns in a collection ...
Motivation: Motif identification for sequences has many important applications in biological studies...
DNA sequence decomposition into k-mers (substrings of length k) and their frequency counting, define...
We use methods from Data Mining and Knowledge Discovery to design an algorithm for detecting motifs ...
Short paperInternational audienceThis paper addresses the discovery of discriminative nary motifs in...
www.cs.iastate.edu/~honavar/aigroup.html This paper describes an approach to data-driven discovery o...
This thesis describes an approach to data-driven discovery of decision trees or rules for assigning ...
Summary. Protein function prediction, i.e. classification of protein sequences according to their bi...
Background: Many open problems in bioinformatics involve elucidating underlying functional signals i...