Sequence classification is an important problem in many real-world applications. Sequence data often contain no explicit "signals," or features, to enable the construction of classification algorithms. Extracting and interpreting the most useful features is challenging, and hand construction of good features is the basis of many classification algorithms. In this thesis, I address this problem by developing a feature-generation algorithm (FGA). FGA is a scalable method for automatic feature generation for sequences; it identifies sequence components and uses domain knowledge, systematically constructs features, explores the space of possible features, and identifies the most useful ones. In the domain of biological sequences, splice-sites...
Background: Many open problems in bioinformatics involve elucidating underlying functional signals i...
Motivation: During the last decade, improvements in high-throughput sequencing have generated a weal...
Motivation: During the last decade, improvements in high-throughput sequencing have generated a weal...
Accurate selection of splice sites during the splicing of precursors to messenger RNA requires both ...
<div><p>Background</p><p>Many open problems in bioinformatics involve elucidating underlying functio...
Background: The identification of relevant biological features in large and complex datasets is an i...
Background: The identification of relevant biological features in large and complex datasets is an i...
AbstractA vast amount of sequence data has been generated due to advancements in DNA sequencing tech...
Many open problems in bioinformatics involve elucidating underlying functional signals in biological...
Taher L. Computational methods for splice site prediction. Bielefeld (Germany): Bielefeld University...
Recently biological sequence databases have grown much faster than the ability of researchers to ann...
Feature selection techniques are often used to reduce data dimensionality, increase classification p...
Feature selection techniques are often used to reduce data dimensionality, increase classification p...
Motivation: In this age of complete genome sequencing, finding the location and structure of genes i...
Motivation: In this age of complete genome sequencing, finding the location and structure of genes i...
Background: Many open problems in bioinformatics involve elucidating underlying functional signals i...
Motivation: During the last decade, improvements in high-throughput sequencing have generated a weal...
Motivation: During the last decade, improvements in high-throughput sequencing have generated a weal...
Accurate selection of splice sites during the splicing of precursors to messenger RNA requires both ...
<div><p>Background</p><p>Many open problems in bioinformatics involve elucidating underlying functio...
Background: The identification of relevant biological features in large and complex datasets is an i...
Background: The identification of relevant biological features in large and complex datasets is an i...
AbstractA vast amount of sequence data has been generated due to advancements in DNA sequencing tech...
Many open problems in bioinformatics involve elucidating underlying functional signals in biological...
Taher L. Computational methods for splice site prediction. Bielefeld (Germany): Bielefeld University...
Recently biological sequence databases have grown much faster than the ability of researchers to ann...
Feature selection techniques are often used to reduce data dimensionality, increase classification p...
Feature selection techniques are often used to reduce data dimensionality, increase classification p...
Motivation: In this age of complete genome sequencing, finding the location and structure of genes i...
Motivation: In this age of complete genome sequencing, finding the location and structure of genes i...
Background: Many open problems in bioinformatics involve elucidating underlying functional signals i...
Motivation: During the last decade, improvements in high-throughput sequencing have generated a weal...
Motivation: During the last decade, improvements in high-throughput sequencing have generated a weal...