Online persistent suffix tree construction has been considered impractical due to its excessive I/O costs. However, these prior studies have not taken into account the effects of the buffer management policy and the internal node structure of the suffix tree on I/O behavior of construction and subsequent retrievals over the tree. We study these two issues in detail in the context of large genomic DNA and protein sequences. In particular, we make the following contributions: (i) a novel, low-overhead buffering policy called TOP-Q which improves the on-disk behavior of suffix tree construction and subsequent retrievals, and (ii) empirical evidence that the space efficient linked-list representation of suffix tree nodes provides significantly ...
Suffix-trees are popular indexing structures for various sequence processing problems in biological ...
Abstract. Our aim is to develop new database technologies for the approximate matching of unstructur...
We propose a new method to build persistent suffix trees for indexing the genomic data. Our algorith...
Online persistent suffix tree construction has been considered impractical due to its excessive I/O ...
Online persistent suffix tree construction has been con-sidered impractical due to its excessive I/O...
Mammalian genomes are typically 3Gbps (gibabase pairs) in size. The largest public database NCBI (Na...
The suffix tree is a well known and popular indexing structure for various sequence processing probl...
A suffix tree is a fundamental data structure for string search-ing algorithms. Unfortunately, when ...
In this study, we present an online suffix tree construction approach where Multiple sequences are i...
Abstract. Suffix trees have been established as one of the most versatile index structures for unstr...
Sequence datasets are ubiquitous in modern life-science applications, and querying sequences is a co...
The construction of suffix tree for very long sequences is essential for many applications, and it p...
The suffix tree (or equivalently, the enhanced suffix array) provides efficient solutions to many pr...
In recent years, bioinformatics becomes an important research field because there are more and more ...
Abstract. Suffix-trees are popular indexing structures for various sequence processing problems in b...
Suffix-trees are popular indexing structures for various sequence processing problems in biological ...
Abstract. Our aim is to develop new database technologies for the approximate matching of unstructur...
We propose a new method to build persistent suffix trees for indexing the genomic data. Our algorith...
Online persistent suffix tree construction has been considered impractical due to its excessive I/O ...
Online persistent suffix tree construction has been con-sidered impractical due to its excessive I/O...
Mammalian genomes are typically 3Gbps (gibabase pairs) in size. The largest public database NCBI (Na...
The suffix tree is a well known and popular indexing structure for various sequence processing probl...
A suffix tree is a fundamental data structure for string search-ing algorithms. Unfortunately, when ...
In this study, we present an online suffix tree construction approach where Multiple sequences are i...
Abstract. Suffix trees have been established as one of the most versatile index structures for unstr...
Sequence datasets are ubiquitous in modern life-science applications, and querying sequences is a co...
The construction of suffix tree for very long sequences is essential for many applications, and it p...
The suffix tree (or equivalently, the enhanced suffix array) provides efficient solutions to many pr...
In recent years, bioinformatics becomes an important research field because there are more and more ...
Abstract. Suffix-trees are popular indexing structures for various sequence processing problems in b...
Suffix-trees are popular indexing structures for various sequence processing problems in biological ...
Abstract. Our aim is to develop new database technologies for the approximate matching of unstructur...
We propose a new method to build persistent suffix trees for indexing the genomic data. Our algorith...