LNCS v. 6124 is Proceedings of the 6th International Conference, AAIM 2010To study the genetic variations of a species, one basic operation is to search for occurrences of patterns in a large number of very similar genomic sequences. To build an indexing data structure on the concatenation of all sequences may require a lot of memory. In this paper, we propose a new scheme to index highly similar sequences by taking advantage of the similarity among the sequences. To store r sequences with k common segments, our index requires only O(n + N logN) bits of memory, where n is the total length of the common segments and N is the total length of the distinct regions in all texts. The total length of all sequences is rn + N, and any scheme to stor...
Abstract—Detecting similar pairs in large biological sequence collections is one of the most commonl...
Motivation: The search for exact matches of matches of substrings in pairs of large genomic sequence...
Algorithms for finding similar, or highly conserved, regions in a group of sequences are at the core...
Part 8: First Workshop on Algorithms for Data and Text Mining in Bioinformatics (WADTMB 2012)Interna...
We present a fast algorithm for sequence clustering and searching which works with large sequence da...
Biology researchers have a pressing need for data management technologies which will make the storag...
The collection indexing problem is defined as follows: Given a collection of highly similar strings,...
Finding the sequence similarity between two genetic codes is an important problem in computational b...
The minimal-length encoding approach is applied to define concept of sequence similarity. A sequence...
We consider the problem of similarity search in a very large sequence database with edit distance as...
Abstract Background Searching for small tandem/disperse repetitive DNA sequences streamlines many bi...
Motivation: Comparison of nucleic acid and protein sequences is a fundamental tool of modern bioinfo...
Traditionally, intra-sequence similarity is exploited for compressing a single DNA sequence. Recentl...
Sequence data is one of the rapidly growing types of data. New efficient and scalable techniques are...
[[abstract]]Searching patterns in the DNA sequence is an important step in biological research. To s...
Abstract—Detecting similar pairs in large biological sequence collections is one of the most commonl...
Motivation: The search for exact matches of matches of substrings in pairs of large genomic sequence...
Algorithms for finding similar, or highly conserved, regions in a group of sequences are at the core...
Part 8: First Workshop on Algorithms for Data and Text Mining in Bioinformatics (WADTMB 2012)Interna...
We present a fast algorithm for sequence clustering and searching which works with large sequence da...
Biology researchers have a pressing need for data management technologies which will make the storag...
The collection indexing problem is defined as follows: Given a collection of highly similar strings,...
Finding the sequence similarity between two genetic codes is an important problem in computational b...
The minimal-length encoding approach is applied to define concept of sequence similarity. A sequence...
We consider the problem of similarity search in a very large sequence database with edit distance as...
Abstract Background Searching for small tandem/disperse repetitive DNA sequences streamlines many bi...
Motivation: Comparison of nucleic acid and protein sequences is a fundamental tool of modern bioinfo...
Traditionally, intra-sequence similarity is exploited for compressing a single DNA sequence. Recentl...
Sequence data is one of the rapidly growing types of data. New efficient and scalable techniques are...
[[abstract]]Searching patterns in the DNA sequence is an important step in biological research. To s...
Abstract—Detecting similar pairs in large biological sequence collections is one of the most commonl...
Motivation: The search for exact matches of matches of substrings in pairs of large genomic sequence...
Algorithms for finding similar, or highly conserved, regions in a group of sequences are at the core...