MOTIVATION: First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. RESULTS: We developed and evaluated a supervised duplicate detection method based on an ex...
Repetitive structures in biological sequences are emerging as an active focus of research and the un...
Chantier qualité GABACKGROUND: There has been a surge in studies linking genome structure and gene e...
Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale Hani Z....
First identified as an issue in 1996, duplication in biological databases introduces redundancy and ...
Motivation First identified as an issue in 1996, duplication in biological databases introduces redu...
GenBank, the EMBL European Nucleotide Archive and the DNA DataBank of Japan, known collectively as t...
Copy number variants (CNV) are associated with phenotypic variation in several species. However, pro...
Genomic sequence duplication is an important mechanism for genome evolution, often result-ing in lar...
For metagenomics datasets, datasets of complex polyploid genomes, and other high-variation genomics ...
Wittler R, Marschall T, Schönhuth A, Makinen V. Repeat- and error-aware comparison of deletions. Bio...
Repetitive elements are sequence patterns in the genome which are duplicated in large quantity. They...
Motivation: The number of reported genetic variants is rapidly growing, empowered by ever faster acc...
The massive volumes of data in biological sequence databases provide a remarkable resource for large...
Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of man...
Multiple sequence alignment is a prerequisite for most evolutionary and phylogenetic analyses. Previ...
Repetitive structures in biological sequences are emerging as an active focus of research and the un...
Chantier qualité GABACKGROUND: There has been a surge in studies linking genome structure and gene e...
Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale Hani Z....
First identified as an issue in 1996, duplication in biological databases introduces redundancy and ...
Motivation First identified as an issue in 1996, duplication in biological databases introduces redu...
GenBank, the EMBL European Nucleotide Archive and the DNA DataBank of Japan, known collectively as t...
Copy number variants (CNV) are associated with phenotypic variation in several species. However, pro...
Genomic sequence duplication is an important mechanism for genome evolution, often result-ing in lar...
For metagenomics datasets, datasets of complex polyploid genomes, and other high-variation genomics ...
Wittler R, Marschall T, Schönhuth A, Makinen V. Repeat- and error-aware comparison of deletions. Bio...
Repetitive elements are sequence patterns in the genome which are duplicated in large quantity. They...
Motivation: The number of reported genetic variants is rapidly growing, empowered by ever faster acc...
The massive volumes of data in biological sequence databases provide a remarkable resource for large...
Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of man...
Multiple sequence alignment is a prerequisite for most evolutionary and phylogenetic analyses. Previ...
Repetitive structures in biological sequences are emerging as an active focus of research and the un...
Chantier qualité GABACKGROUND: There has been a surge in studies linking genome structure and gene e...
Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale Hani Z....