ABSTRACT Minwise hashing (Minhash) is a widely popular indexing scheme in practice. Minhash is designed for estimating set resemblance and is known to be suboptimal in many applications where the desired measure is set overlap (i.e., inner product between binary vectors) or set containment. Minhash has inherent bias towards smaller sets, which adversely affects its performance in applications where such a penalization is not desirable. In this paper, we propose asymmetric minwise hashing (MH-ALSH), to provide a solution to this well-known problem. The new scheme utilizes asymmetric transformations to cancel the bias of traditional minhash towards smaller sets, making the final "collision probability" monotonic in the inner product...
We consider the problem of indexing a text T (of length n) with a light data structure that supports...
International audienceMinHash sketching is an important algorithm for efficient document retrieval a...
© 2017 IEEE. Learning based hashing has become increasingly popular because of its high efficiency i...
Minwise hashing is a standard technique in the context of search for approximating set similarities...
We investigate probabilistic hashing techniques for addressing computational and memory challenges i...
© 2017 IEEE. Learning to hash has attracted broad research interests in recent computer vision and m...
Minwise hashing is a standard technique in the context of search for efficiently computing set simil...
Compact hash code learning has been widely applied to fast similarity search owing to its significan...
Minwise hashing is a standard technique in the context of search for approximating set similarities....
MinHash and SimHash are the two widely adopted Locality Sensitive Hashing (LSH) al-gorithms for larg...
Abstract—In information retrieval, efficient accomplishing the nearest neighbor search on large scal...
Min-wise hashing is an important method for estimating the size of the intersection of sets, based o...
Minwise hashing is a standard technique in the context of search for efficiently computing set simil...
Binary coding or hashing techniques are recognized to accomplish efficient near neighbor search, and...
A method is proposed for indexing spaces with arbitrary distance measures, so as to achieve efficien...
We consider the problem of indexing a text T (of length n) with a light data structure that supports...
International audienceMinHash sketching is an important algorithm for efficient document retrieval a...
© 2017 IEEE. Learning based hashing has become increasingly popular because of its high efficiency i...
Minwise hashing is a standard technique in the context of search for approximating set similarities...
We investigate probabilistic hashing techniques for addressing computational and memory challenges i...
© 2017 IEEE. Learning to hash has attracted broad research interests in recent computer vision and m...
Minwise hashing is a standard technique in the context of search for efficiently computing set simil...
Compact hash code learning has been widely applied to fast similarity search owing to its significan...
Minwise hashing is a standard technique in the context of search for approximating set similarities....
MinHash and SimHash are the two widely adopted Locality Sensitive Hashing (LSH) al-gorithms for larg...
Abstract—In information retrieval, efficient accomplishing the nearest neighbor search on large scal...
Min-wise hashing is an important method for estimating the size of the intersection of sets, based o...
Minwise hashing is a standard technique in the context of search for efficiently computing set simil...
Binary coding or hashing techniques are recognized to accomplish efficient near neighbor search, and...
A method is proposed for indexing spaces with arbitrary distance measures, so as to achieve efficien...
We consider the problem of indexing a text T (of length n) with a light data structure that supports...
International audienceMinHash sketching is an important algorithm for efficient document retrieval a...
© 2017 IEEE. Learning based hashing has become increasingly popular because of its high efficiency i...