AbstractFinding approximate overlaps is the first phase of many sequence assembly methods. Given a set of strings of total length n and an error-rate ϵ, the goal is to find, for all-pairs of strings, their suffix/prefix matches (overlaps) that are within edit distance k=⌈ϵℓ⌉, where ℓ is the length of the overlap. We propose a new solution for this problem based on backward backtracking (Lam, et al., 2008) and suffix filters (Kärkkäinen and Na, 2008). Our technique uses nHk+o(nlogσ)+rlogr bits of space, where Hk is the k-th order entropy and σ the alphabet size. In practice, it is more scalable in terms of space, and comparable in terms of time, than q-gram filters (Rasmussen, et al., 2006). Our method is also easy to parallelize and scales ...
The overlap stage of a string graph-based assembler is considered one of the most time- and space-co...
We investigate the application of trie-based data structures, suffix trees and suffix arrays in the ...
We study the number Nk of length-k word matches between pairs of evolutionarily related DNA sequence...
AbstractFinding approximate overlaps is the first phase of many sequence assembly methods. Given a s...
The next-generation sequencing (NGS) technology outputs a huge number of sequences (reads) that requ...
International audienceComputing suffix-prefix overlaps for a large collection of strings is a fundam...
We present a novel algorithmic framework for solving approximate sequence matching problems that per...
The evolution of the next generation sequencing technology increases the demand for efficient soluti...
We show how to parallelize the optimal algorithm proposed by Tustumi et al. [19] to solve the all-pa...
The next generation sequencing technology creates a huge number of sequences (reads), which constitu...
Finding all longest suffix-prefix matches for a collection of strings is known as the all pairs suff...
All-pairs suffix-prefix matching is an important part of DNA sequence assembly where it is the most ...
Graduation date: 1993As the volume of genetic sequence data increases due to improved sequencing\ud ...
Algorithms for finding similar, or highly conserved, regions in a group of sequences are at the core...
Let T be a text of length n and P be a pattern of length m, both strings over a fixed finite alphabe...
The overlap stage of a string graph-based assembler is considered one of the most time- and space-co...
We investigate the application of trie-based data structures, suffix trees and suffix arrays in the ...
We study the number Nk of length-k word matches between pairs of evolutionarily related DNA sequence...
AbstractFinding approximate overlaps is the first phase of many sequence assembly methods. Given a s...
The next-generation sequencing (NGS) technology outputs a huge number of sequences (reads) that requ...
International audienceComputing suffix-prefix overlaps for a large collection of strings is a fundam...
We present a novel algorithmic framework for solving approximate sequence matching problems that per...
The evolution of the next generation sequencing technology increases the demand for efficient soluti...
We show how to parallelize the optimal algorithm proposed by Tustumi et al. [19] to solve the all-pa...
The next generation sequencing technology creates a huge number of sequences (reads), which constitu...
Finding all longest suffix-prefix matches for a collection of strings is known as the all pairs suff...
All-pairs suffix-prefix matching is an important part of DNA sequence assembly where it is the most ...
Graduation date: 1993As the volume of genetic sequence data increases due to improved sequencing\ud ...
Algorithms for finding similar, or highly conserved, regions in a group of sequences are at the core...
Let T be a text of length n and P be a pattern of length m, both strings over a fixed finite alphabe...
The overlap stage of a string graph-based assembler is considered one of the most time- and space-co...
We investigate the application of trie-based data structures, suffix trees and suffix arrays in the ...
We study the number Nk of length-k word matches between pairs of evolutionarily related DNA sequence...