We investigate the application of trie-based data structures, suffix trees and suffix arrays in the problem of overlap detection in fragment assembly. Both data structures are theoretically and experimentally analyzed on speed and space. By using heuristics, we can greatly reduce the calls to the time-consuming dynamic programming, and have improved the speed of overlap detection up to 1,000 times with high accuracy in our collaborative DNA sequencing with Brookhaven National Laboratory. We also studied the problem of approximating maximum space savings in tries structures for unification factoring in logic programming, which is proved to be hard. 1 Introduction Trie-based data structures for strings have proven themselves in a wide variet...
This thesis presents an application of a generalized suffix tree extended by the use of frequency of...
© The Author 2017. Published by Oxford University Press. All rights reserved. The application of adv...
International audienceComputing suffix-prefix overlaps for a large collection of strings is a fundam...
The evolution of the next generation sequencing technology increases the demand for efficient soluti...
Despite the prodigious throughput of the sequencing instruments currently on the market, the assembl...
An effective computer program for assembling DNA fragments, the contig assembly program (CAP), has b...
Current DNA sequencers produce huge numbers of short DNA reads. In or-der to determine a genome sequ...
AbstractFinding approximate overlaps is the first phase of many sequence assembly methods. Given a s...
The next-generation sequencing (NGS) technology outputs a huge number of sequences (reads) that requ...
In this thesis we present algorithmic results for computational problems arising in two important ar...
In recent years, bioinformatics becomes an important research field because there are more and more ...
The overlap stage of a string graph-based assembler is considered one of the most time- and space-co...
Abstract. The operation of overlap assembly was defined by Csuhaj-Varju, Petre, and Vaszil as a form...
We consider the problem of estimating the number of sequences overlapping with a certain length fro...
We introduce a data structure called a superword array for finding quickly matches between DNA seque...
This thesis presents an application of a generalized suffix tree extended by the use of frequency of...
© The Author 2017. Published by Oxford University Press. All rights reserved. The application of adv...
International audienceComputing suffix-prefix overlaps for a large collection of strings is a fundam...
The evolution of the next generation sequencing technology increases the demand for efficient soluti...
Despite the prodigious throughput of the sequencing instruments currently on the market, the assembl...
An effective computer program for assembling DNA fragments, the contig assembly program (CAP), has b...
Current DNA sequencers produce huge numbers of short DNA reads. In or-der to determine a genome sequ...
AbstractFinding approximate overlaps is the first phase of many sequence assembly methods. Given a s...
The next-generation sequencing (NGS) technology outputs a huge number of sequences (reads) that requ...
In this thesis we present algorithmic results for computational problems arising in two important ar...
In recent years, bioinformatics becomes an important research field because there are more and more ...
The overlap stage of a string graph-based assembler is considered one of the most time- and space-co...
Abstract. The operation of overlap assembly was defined by Csuhaj-Varju, Petre, and Vaszil as a form...
We consider the problem of estimating the number of sequences overlapping with a certain length fro...
We introduce a data structure called a superword array for finding quickly matches between DNA seque...
This thesis presents an application of a generalized suffix tree extended by the use of frequency of...
© The Author 2017. Published by Oxford University Press. All rights reserved. The application of adv...
International audienceComputing suffix-prefix overlaps for a large collection of strings is a fundam...