In this paper we study similarity join and search on multi-attribute data. Traditional methods on single-attribute data have pruning power only on single attributes and cannot eciently support multi-attribute data. To address this problem, we propose a prefix tree index which has holis-tic pruning ability on multiple attributes. We propose a cost model to quantify the prefix tree which can guide the prefix tree construction. Based on the prefix tree, we devise a filter-verification framework to support similarity search and join on multi-attribute data. The filter step prunes a large number of dissimilar results and identifies some candi-dates using the prefix tree and the verification step verifies the candidates to generate the final answ...
As an essential operation in data cleaning, the similarity join has attracted considerable attention...
In this thesis, we consider the problem of processing similarity queries over a dataset of top-k ran...
A similarity join aims to find all similar pairs between two collections of records. Established alg...
As two important operations in data cleaning, similarity join and similarity search have attracted m...
abstract: Similarity Joins are some of the most useful and powerful data processing techniques. They...
We study the string similarity search problem with edit-distance constraints, which, given a set of ...
We study the string similarity search problem with edit-distance constraints, which, given a set of ...
Similarity Join plays an important role in data integration and cleansing, record linkage and data d...
Given a set of records, a threshold value t and a similarity function, we investigate the problem of...
In this thesis, we study efficient exact query processing algorithms for edit similarity queries and...
The tree similarity join computes all similar pairs in a collection of trees. Two trees are similar ...
Set similarity join is an essential operation in data integration and big data analytics, that finds...
Abstract. Text similarity join operator joins two relations if their join attributes are textually s...
A similarity join aims to find all similar pairs between two collections of records. Established app...
Metric databases are databases where a metric distance function is defined for pairs of database obj...
As an essential operation in data cleaning, the similarity join has attracted considerable attention...
In this thesis, we consider the problem of processing similarity queries over a dataset of top-k ran...
A similarity join aims to find all similar pairs between two collections of records. Established alg...
As two important operations in data cleaning, similarity join and similarity search have attracted m...
abstract: Similarity Joins are some of the most useful and powerful data processing techniques. They...
We study the string similarity search problem with edit-distance constraints, which, given a set of ...
We study the string similarity search problem with edit-distance constraints, which, given a set of ...
Similarity Join plays an important role in data integration and cleansing, record linkage and data d...
Given a set of records, a threshold value t and a similarity function, we investigate the problem of...
In this thesis, we study efficient exact query processing algorithms for edit similarity queries and...
The tree similarity join computes all similar pairs in a collection of trees. Two trees are similar ...
Set similarity join is an essential operation in data integration and big data analytics, that finds...
Abstract. Text similarity join operator joins two relations if their join attributes are textually s...
A similarity join aims to find all similar pairs between two collections of records. Established app...
Metric databases are databases where a metric distance function is defined for pairs of database obj...
As an essential operation in data cleaning, the similarity join has attracted considerable attention...
In this thesis, we consider the problem of processing similarity queries over a dataset of top-k ran...
A similarity join aims to find all similar pairs between two collections of records. Established alg...