In this thesis, we study efficient exact query processing algorithms for edit similarity queries and their variants. An edit similarity query finds database strings within edit distance threshold t from the query string. It plays an important role in many application areas, such as deduplication, data integration and cleansing, query suggestion, bioinformatics, and pattern recognition. Consequently, there has been much interest in developing efficient algorithms for this problem.We study three specific problems in this thesis. The first problem is efficient algorithms for similarity joins with an edit distance constraint. Currently, the most prevalent approach is based on extracting overlapping grams from strings and considering only string...
A similarity query is to find from a collection of items those that are similar to a given query ite...
In many database applications involving string data, it is common to have near neighbor queries (ask...
Approximate query processing based on multiple similarity metrics is prevalent and essential for man...
We study the string similarity search problem with edit-distance constraints, which, given a set of ...
We study the string similarity search problem with edit-distance constraints, which, given a set of ...
As an essential operation in data cleaning, the similarity join has attracted considerable attention...
Abstract—The string similarity join, which is employed to find similar string pairs from string sets...
Edit distance is the most widely used method to quantify similarity between two strings. We investig...
Abstract—Edit distance is widely used for measuring the similarity between two strings. As a primiti...
Edit distance similarity search, also called approximate pattern matching, is a fundamental problem ...
String similarity join is a basic and essential operation in many applications. In this paper, we in...
© 2017 IEEE. String similarity search is a fundamental query that has been widely used for DNA seque...
Given a collection of strings, goal of the approximate string matching is to efficiently find the st...
Fast similarity search is important for time-sensitive applications. Those include both enterprise a...
Abstract A string similarity join finds similar pairs between two collections of strings. Many appli...
A similarity query is to find from a collection of items those that are similar to a given query ite...
In many database applications involving string data, it is common to have near neighbor queries (ask...
Approximate query processing based on multiple similarity metrics is prevalent and essential for man...
We study the string similarity search problem with edit-distance constraints, which, given a set of ...
We study the string similarity search problem with edit-distance constraints, which, given a set of ...
As an essential operation in data cleaning, the similarity join has attracted considerable attention...
Abstract—The string similarity join, which is employed to find similar string pairs from string sets...
Edit distance is the most widely used method to quantify similarity between two strings. We investig...
Abstract—Edit distance is widely used for measuring the similarity between two strings. As a primiti...
Edit distance similarity search, also called approximate pattern matching, is a fundamental problem ...
String similarity join is a basic and essential operation in many applications. In this paper, we in...
© 2017 IEEE. String similarity search is a fundamental query that has been widely used for DNA seque...
Given a collection of strings, goal of the approximate string matching is to efficiently find the st...
Fast similarity search is important for time-sensitive applications. Those include both enterprise a...
Abstract A string similarity join finds similar pairs between two collections of strings. Many appli...
A similarity query is to find from a collection of items those that are similar to a given query ite...
In many database applications involving string data, it is common to have near neighbor queries (ask...
Approximate query processing based on multiple similarity metrics is prevalent and essential for man...