Today, a myriad of data sources, from the Internet to business operations to scientific instruments, produce large and different types of data. Many application scenarios, e.g., marketing analysis, sensor networks, and medical and biological applications, call for identifying and processing similarities in big data. As a result, it is imperative to develop new similarity query processing approaches and systems that scale from low dimensional data to high dimensional data, from single machine to clusters of hundreds of machines, and from disk-based to memory-based processing. This dissertation introduces and studies several similarity-aware query operators, analyzes and optimizes their performance. The first contribution of this dissertati...
In this thesis, we study the Hamming distance query problem. Hamming distance measures the number of...
Due to the increasing complexity of current digital data, the similarity search has become a fundame...
An important feature of a database management systems (DBMS) is its client/server architecture, wher...
Many application scenarios, e.g., marketing analysis, sensor networks, and medical and biological ap...
Many application scenarios, e.g., marketing analysis, sensor networks, and medical and biological ap...
A similarity query is to find from a collection of items those that are similar to a given query ite...
Data analysts spend more than 80% of time on data cleaning and integration in the whole process of d...
2 Many application scenarios, e.g., marketing analysis, sensor networks, and medical and biological ...
This paper considers a multi-query optimization issue for distributed similarity query processing, w...
Metric databases are databases where a metric distance function is defined for pairs of database obj...
Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision support s...
Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision support s...
Similarity joins have been studied as key operations in multiple application domains, e.g., record l...
Similarity joins have been studied as key operations in multiple application domains, e.g., record l...
As database technology is applied to more and more application domains, user queries are becoming in...
In this thesis, we study the Hamming distance query problem. Hamming distance measures the number of...
Due to the increasing complexity of current digital data, the similarity search has become a fundame...
An important feature of a database management systems (DBMS) is its client/server architecture, wher...
Many application scenarios, e.g., marketing analysis, sensor networks, and medical and biological ap...
Many application scenarios, e.g., marketing analysis, sensor networks, and medical and biological ap...
A similarity query is to find from a collection of items those that are similar to a given query ite...
Data analysts spend more than 80% of time on data cleaning and integration in the whole process of d...
2 Many application scenarios, e.g., marketing analysis, sensor networks, and medical and biological ...
This paper considers a multi-query optimization issue for distributed similarity query processing, w...
Metric databases are databases where a metric distance function is defined for pairs of database obj...
Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision support s...
Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision support s...
Similarity joins have been studied as key operations in multiple application domains, e.g., record l...
Similarity joins have been studied as key operations in multiple application domains, e.g., record l...
As database technology is applied to more and more application domains, user queries are becoming in...
In this thesis, we study the Hamming distance query problem. Hamming distance measures the number of...
Due to the increasing complexity of current digital data, the similarity search has become a fundame...
An important feature of a database management systems (DBMS) is its client/server architecture, wher...