We consider the problem of similarity search in a very large sequence database with edit distance as the similarity measure. Given limited main memory, our goal is to develop a reference-based index that reduces the number of costly edit distance computations in order to answer a query. The idea in reference-based indexing is to select a small set of reference sequences that serve as a surrogate for the other sequences in the database. We consider two novel strategies for selecting references as well as a new strategy for assigning references to database sequences. Our experimental results show that our selection and assignment methods far outperform competitive methods. For example, our methods prune up to 20 times as many sequences as the...
We present a fast algorithm for sequence clustering and searching which works with large sequence da...
Abstract—We consider the problem of finding similar patterns in a time sequence. Typical application...
We propose an indexing method for time sequences for processing similarity queries. We use the Discr...
This paper introduces a novel method, called Reference-Based String Alignment (RBSA), that speeds up...
LNCS v. 6124 is Proceedings of the 6th International Conference, AAIM 2010To study the genetic varia...
In this paper, we consider the problem of efficient matching and retrieval of sequences of different...
Biology researchers have a pressing need for data management technologies which will make the storag...
A similarity query is to find from a collection of items those that are similar to a given query ite...
AbstractDatabase sequence comparison applications compare a query sequence with each sequence in a d...
Database sequencing applications such as sequence comparison process large size of sequences and con...
We address the problem of similarity search in large time series databases. We introduce a novel ind...
Sequence data is one of the rapidly growing types of data. New efficient and scalable techniques are...
. We propose an indexing method for time sequences for processing similarity queries. We use the Dis...
Edit distance is the most widely used method to quantify similarity between two strings. We investig...
Abstract. We propose an indexing method for time sequences for processing similarity queries. We use...
We present a fast algorithm for sequence clustering and searching which works with large sequence da...
Abstract—We consider the problem of finding similar patterns in a time sequence. Typical application...
We propose an indexing method for time sequences for processing similarity queries. We use the Discr...
This paper introduces a novel method, called Reference-Based String Alignment (RBSA), that speeds up...
LNCS v. 6124 is Proceedings of the 6th International Conference, AAIM 2010To study the genetic varia...
In this paper, we consider the problem of efficient matching and retrieval of sequences of different...
Biology researchers have a pressing need for data management technologies which will make the storag...
A similarity query is to find from a collection of items those that are similar to a given query ite...
AbstractDatabase sequence comparison applications compare a query sequence with each sequence in a d...
Database sequencing applications such as sequence comparison process large size of sequences and con...
We address the problem of similarity search in large time series databases. We introduce a novel ind...
Sequence data is one of the rapidly growing types of data. New efficient and scalable techniques are...
. We propose an indexing method for time sequences for processing similarity queries. We use the Dis...
Edit distance is the most widely used method to quantify similarity between two strings. We investig...
Abstract. We propose an indexing method for time sequences for processing similarity queries. We use...
We present a fast algorithm for sequence clustering and searching which works with large sequence da...
Abstract—We consider the problem of finding similar patterns in a time sequence. Typical application...
We propose an indexing method for time sequences for processing similarity queries. We use the Discr...