Data Cleaning is an important process that has been at the center of research interest in recent years. An important end goal of effective data cleaning is to identify the relational tuple or tuples that are "most related" to a given query tuple. Various techniques have been proposed in the literature for efficiently identifying approximate matches to a query string against a single attribute of a relation. In addition to constructing a ranking (i.e., ordering) of these matches, the techniques often associate, with each match, scores that quantify the extent of the match. Since multiple attributes could exist in the query tuple, issuing approximate match operations for each of them separately will effectively create a ...
Many database applications require similarity based retrieval on stored text and/or multimedia objec...
We present a novel maximum-likelihood-based algorithm for estimating the distribution of alignment s...
Matching dependencies were recently introduced as declarative rules for data cleaning and entity res...
Join techniques deploying approximate match predicates are fundamental data cleaning operations. A v...
We present a system for indexing large sets of records, and retrieving exact and approximate matches...
Abstract—Information systems apply various techniques to rank query answers. Ranking queries (or top...
Rank aggregation has recently been proposed as a useful abstraction that has several applications, i...
We propose a partial ordering that approximates a ranking of the items in a database according to t...
We investigate the problem of creating and analyzing samples of relational databases to find relatio...
In this paper we address the problem of data cleaning when multiple data sources are merged to creat...
In various applications such as data cleansing, being able to retrieve categorical or numerical attr...
Similarity joins are troublesome database operators that often produce results much larger than the ...
This dissertation focuses on supporting ranking in relational database systems through a rank-aware ...
Due to imprecise query intention, Web database users often use a limited number of keywords that are...
Ranking queries and similarity queries are elementary operations with many important applications. T...
Many database applications require similarity based retrieval on stored text and/or multimedia objec...
We present a novel maximum-likelihood-based algorithm for estimating the distribution of alignment s...
Matching dependencies were recently introduced as declarative rules for data cleaning and entity res...
Join techniques deploying approximate match predicates are fundamental data cleaning operations. A v...
We present a system for indexing large sets of records, and retrieving exact and approximate matches...
Abstract—Information systems apply various techniques to rank query answers. Ranking queries (or top...
Rank aggregation has recently been proposed as a useful abstraction that has several applications, i...
We propose a partial ordering that approximates a ranking of the items in a database according to t...
We investigate the problem of creating and analyzing samples of relational databases to find relatio...
In this paper we address the problem of data cleaning when multiple data sources are merged to creat...
In various applications such as data cleansing, being able to retrieve categorical or numerical attr...
Similarity joins are troublesome database operators that often produce results much larger than the ...
This dissertation focuses on supporting ranking in relational database systems through a rank-aware ...
Due to imprecise query intention, Web database users often use a limited number of keywords that are...
Ranking queries and similarity queries are elementary operations with many important applications. T...
Many database applications require similarity based retrieval on stored text and/or multimedia objec...
We present a novel maximum-likelihood-based algorithm for estimating the distribution of alignment s...
Matching dependencies were recently introduced as declarative rules for data cleaning and entity res...