Approximate all-pairs suffix/prefix overlaps

Välimäki, Niko
Ladra, Susana
Mäkinen, Veli

Open link

Publication date

April 2012

DOI

10.1016/j.ic.2012.02.002

Publisher

Elsevier Inc.

Abstract

AbstractFinding approximate overlaps is the first phase of many sequence assembly methods. Given a set of strings of total length n and an error-rate ϵ, the goal is to find, for all-pairs of strings, their suffix/prefix matches (overlaps) that are within edit distance k=⌈ϵℓ⌉, where ℓ is the length of the overlap. We propose a new solution for this problem based on backward backtracking (Lam, et al., 2008) and suffix filters (Kärkkäinen and Na, 2008). Our technique uses nHk+o(nlogσ)+rlogr bits of space, where Hk is the k-th order entropy and σ the alphabet size. In practice, it is more scalable in terms of space, and comparable in terms of time, than q-gram filters (Rasmussen, et al., 2006). Our method is also easy to parallelize and scales ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Approximate all-pairs suffix/prefix overlaps

Abstract

Extracted data

Approximate all-pairs suffix/prefix overlaps

Abstract

Extracted data

Related items

Related items