We propose a new method to build persistent suffix trees for indexing the genomic data. Our algorithm DiGeST (Disk-Based Genomic Suffix Tree) improves significantly over previous work in reducing the random access to the in-put string and performing only two passes over disk data. DiGeST is based on the two-phase multi-way merge sort paradigm using a concise binary representation of the DNA alphabet. Furthermore, our method scales to larger genomic data than managed before
With advances in sequencing technology and through ag-gressive sequencing efforts, DNA sequence data...
Sequence data is one of the rapidly growing types of data. New efficient and scalable techniques are...
The construction of suffix tree for very long sequences is essential for many applications, and it p...
The suffix tree is a well known and popular indexing structure for various sequence processing probl...
Mammalian genomes are typically 3Gbps (gibabase pairs) in size. The largest public database NCBI (Na...
Suffix-trees are popular indexing structures for various sequence processing problems in biological...
Abstract. Suffix-trees are popular indexing structures for various sequence processing problems in b...
A suffix tree is a fundamental data structure for string search-ing algorithms. Unfortunately, when ...
With advances in high-throughput sequencing methods, and the corresponding exponential growth in seq...
In recent years, bioinformatics becomes an important research field because there are more and more ...
Online persistent suffix tree construction has been considered impractical due to its excessive I/O ...
Abstract This thesis makes three contributions in the area of computing science. Our first contr...
This thesis makes three contributions in the area of computing science. Our first contribution is th...
Online persistent suffix tree construction has been con-sidered impractical due to its excessive I/O...
Abstract. Our aim is to develop new database technologies for the approximate matching of unstructur...
With advances in sequencing technology and through ag-gressive sequencing efforts, DNA sequence data...
Sequence data is one of the rapidly growing types of data. New efficient and scalable techniques are...
The construction of suffix tree for very long sequences is essential for many applications, and it p...
The suffix tree is a well known and popular indexing structure for various sequence processing probl...
Mammalian genomes are typically 3Gbps (gibabase pairs) in size. The largest public database NCBI (Na...
Suffix-trees are popular indexing structures for various sequence processing problems in biological...
Abstract. Suffix-trees are popular indexing structures for various sequence processing problems in b...
A suffix tree is a fundamental data structure for string search-ing algorithms. Unfortunately, when ...
With advances in high-throughput sequencing methods, and the corresponding exponential growth in seq...
In recent years, bioinformatics becomes an important research field because there are more and more ...
Online persistent suffix tree construction has been considered impractical due to its excessive I/O ...
Abstract This thesis makes three contributions in the area of computing science. Our first contr...
This thesis makes three contributions in the area of computing science. Our first contribution is th...
Online persistent suffix tree construction has been con-sidered impractical due to its excessive I/O...
Abstract. Our aim is to develop new database technologies for the approximate matching of unstructur...
With advances in sequencing technology and through ag-gressive sequencing efforts, DNA sequence data...
Sequence data is one of the rapidly growing types of data. New efficient and scalable techniques are...
The construction of suffix tree for very long sequences is essential for many applications, and it p...