We consider the problem of automatic discovery of pat-terns and the corresponding subfamilies in a set of biosequences. The sequences are unaligned and may contain noise of unknown level. The patterns are of the type used in PROSITE database. In our approach we discover patterns and the respective subfamilies sim-ultaneously. We develop a theoretically substantiated significance measure for a set of such patterns and an algorithm approximating the best pattern set and the subfamilies. The approach is based on the minimum description length (MDL) principle. We report a com-puting experiment correctly finding subfamilies in the family of chromo domains and revealing new strong patterns
10.1504/IJDMB.2011.045413International Journal of Data Mining and Bioinformatics56611-62
We describe a new approach for identifying sequence similarity between a query sequence and a data b...
10.1109/BIBE.2006.253315Proceedings - Sixth IEEE Symposium on BioInformatics and BioEngineering, BIB...
This paper surveys approaches to the discovery of patterns in biosequences and places these approach...
This paper is a survey of approaches and algorithms used for the automatic discovery of patterns in ...
this paper is as follows. This introduction is followed in Section 2 by a brief introduction to some...
The enormous growth of biomolecular databases makes it increasingly important to have fast and autom...
In recent years, we have seen a rapid increase in the available DNA and protein data coming from var...
A method for detecting patterns in biological sequences is described that incorporates rigorous stat...
Pattern discovery in biological sequences (e.g., DNA se-quences) is one of the most challenging task...
Many tasks of contemporary Molecular Biology rely increasingly on au- tomated techniques for the dis...
The identification of interesting patterns (or subsequences) in biosequences has an important role i...
The emergence of automated high-throughput sequencing technologies has resulted in a huge increase o...
Functionally related genes often appear in each others neighborhood on the genome, however the order...
The analysis of sequences is one of the major research areas of bio-informatics. Inspired by this re...
10.1504/IJDMB.2011.045413International Journal of Data Mining and Bioinformatics56611-62
We describe a new approach for identifying sequence similarity between a query sequence and a data b...
10.1109/BIBE.2006.253315Proceedings - Sixth IEEE Symposium on BioInformatics and BioEngineering, BIB...
This paper surveys approaches to the discovery of patterns in biosequences and places these approach...
This paper is a survey of approaches and algorithms used for the automatic discovery of patterns in ...
this paper is as follows. This introduction is followed in Section 2 by a brief introduction to some...
The enormous growth of biomolecular databases makes it increasingly important to have fast and autom...
In recent years, we have seen a rapid increase in the available DNA and protein data coming from var...
A method for detecting patterns in biological sequences is described that incorporates rigorous stat...
Pattern discovery in biological sequences (e.g., DNA se-quences) is one of the most challenging task...
Many tasks of contemporary Molecular Biology rely increasingly on au- tomated techniques for the dis...
The identification of interesting patterns (or subsequences) in biosequences has an important role i...
The emergence of automated high-throughput sequencing technologies has resulted in a huge increase o...
Functionally related genes often appear in each others neighborhood on the genome, however the order...
The analysis of sequences is one of the major research areas of bio-informatics. Inspired by this re...
10.1504/IJDMB.2011.045413International Journal of Data Mining and Bioinformatics56611-62
We describe a new approach for identifying sequence similarity between a query sequence and a data b...
10.1109/BIBE.2006.253315Proceedings - Sixth IEEE Symposium on BioInformatics and BioEngineering, BIB...