A novel access structure for similarity search in metric data, called Similarity Hashing (SH), is proposed. Its multi-level hash structure of separable buckets on each level supports easy insertion and bounded search costs, because at most one bucket needs to be accessed at each level for range queries up to a pre-dened value of search radius. At the same time, the number of distance computations is always signicantly reduced by use of pre-computed distances obtained at insertion time. Buckets of static les can be arranged in such a way that the I/O costs never exceed the costs to scan a compressed sequential le. Experimental results demonstrate that the performance of SH is superior to the available tree-based structures. Contrary to tree ...
Research Doctorate - Doctor of Philosophy (PhD)This thesis presents techniques for accelerating simi...
Abstract. Metric access methods (MAMs) serve as a tool for speeding similarity queries. However, all...
In this paper, we focus on indexing and searching in high-dimensional data. To achieve the target we...
The nearest- or near-neighbor query problems arise in a large variety of database applications, usua...
Abstract. In this paper we present a scalable and distributed access structure for similarity search...
Metric databases are databases where a metric distance function is defined for pairs of database obj...
The similarity search consists on retrieving all objects within a database that are similar or relev...
Similarity search is the basis for many data analytics techniques, including k-nearest neighbor clas...
<p> This paper proposes a new hashing framework to conduct similarity search via the following step...
The need for a retrieval based not on the attribute val-ues but on the very data content has recentl...
Given a set of entities, the all-pairs similarity search aims at identifying all pairs of entities t...
A method is proposed for indexing spaces with arbitrary distance measures, so as to achieve efficien...
Hashing is very useful for fast approximate similarity search on large database. In the unsupervised...
Similarity search is a very important operation in multimedia databases and other database applicati...
Similarity search in metric spaces is a general paradigm that can be used in several application fie...
Research Doctorate - Doctor of Philosophy (PhD)This thesis presents techniques for accelerating simi...
Abstract. Metric access methods (MAMs) serve as a tool for speeding similarity queries. However, all...
In this paper, we focus on indexing and searching in high-dimensional data. To achieve the target we...
The nearest- or near-neighbor query problems arise in a large variety of database applications, usua...
Abstract. In this paper we present a scalable and distributed access structure for similarity search...
Metric databases are databases where a metric distance function is defined for pairs of database obj...
The similarity search consists on retrieving all objects within a database that are similar or relev...
Similarity search is the basis for many data analytics techniques, including k-nearest neighbor clas...
<p> This paper proposes a new hashing framework to conduct similarity search via the following step...
The need for a retrieval based not on the attribute val-ues but on the very data content has recentl...
Given a set of entities, the all-pairs similarity search aims at identifying all pairs of entities t...
A method is proposed for indexing spaces with arbitrary distance measures, so as to achieve efficien...
Hashing is very useful for fast approximate similarity search on large database. In the unsupervised...
Similarity search is a very important operation in multimedia databases and other database applicati...
Similarity search in metric spaces is a general paradigm that can be used in several application fie...
Research Doctorate - Doctor of Philosophy (PhD)This thesis presents techniques for accelerating simi...
Abstract. Metric access methods (MAMs) serve as a tool for speeding similarity queries. However, all...
In this paper, we focus on indexing and searching in high-dimensional data. To achieve the target we...