Effective partitioning multimedia indexes is key for efficient kNN search. But existing algorithms are based on document similarity, without partition size or redundancy constraints. Our goal is to create an index partitioning algorithm that addresses the specific properties of a distributed system: load balancing across nodes, redundancy in node failure and efficient node usage under concurrent querying. We propose the representation of data with overcomplete codebooks. Each document is quantized into a small set of codewords and indexed on per-codeword partitions. Quantization algorithms are designed to fit data as best as possible, leading to a bias toward codewords that fit the principal directions of data in the original space. In this...
In order to achieve large scalability, indexing structures are usually distributed to incorporate m...
In a shared-nothing, distributed text retrieval system, queries are processed over an inverted index...
In order to achieve large scalability, indexing structures are usually distributed to incorporate mo...
This paper addresses the problem of balanced, redundant indexing of media information. Our goal is t...
The creation of very large-scale multimedia search engines, with more than one billion images and v...
DHT systems are structured overlay networks capable of using P2P resources as a scalable platform fo...
In recent years, there is an ever-increasing research focus on Bag-of-Words based near duplicate vis...
One of the largest problems associated with content-based indexing of multi-media documents is the ...
Nowadays, data partition plays an important role in eliminating duplicate data in green storage and ...
AbstractWe give new space/time tradeoffs for compressed indexes that answer document retrieval queri...
Careful architectural decisions are required in order to create a highly available and scalable sear...
The paper proposes a unified model for multimedia data retrieval which includes data representatives...
Index partitioning techniques - where indexes are broken into multiple distinct sub-indexes - are a ...
In this paper, we introduce a new collection selection strategy to be operated in search engines wit...
International audienceMany algorithms for approximate nearest neighbor search in high-dimensional sp...
In order to achieve large scalability, indexing structures are usually distributed to incorporate m...
In a shared-nothing, distributed text retrieval system, queries are processed over an inverted index...
In order to achieve large scalability, indexing structures are usually distributed to incorporate mo...
This paper addresses the problem of balanced, redundant indexing of media information. Our goal is t...
The creation of very large-scale multimedia search engines, with more than one billion images and v...
DHT systems are structured overlay networks capable of using P2P resources as a scalable platform fo...
In recent years, there is an ever-increasing research focus on Bag-of-Words based near duplicate vis...
One of the largest problems associated with content-based indexing of multi-media documents is the ...
Nowadays, data partition plays an important role in eliminating duplicate data in green storage and ...
AbstractWe give new space/time tradeoffs for compressed indexes that answer document retrieval queri...
Careful architectural decisions are required in order to create a highly available and scalable sear...
The paper proposes a unified model for multimedia data retrieval which includes data representatives...
Index partitioning techniques - where indexes are broken into multiple distinct sub-indexes - are a ...
In this paper, we introduce a new collection selection strategy to be operated in search engines wit...
International audienceMany algorithms for approximate nearest neighbor search in high-dimensional sp...
In order to achieve large scalability, indexing structures are usually distributed to incorporate m...
In a shared-nothing, distributed text retrieval system, queries are processed over an inverted index...
In order to achieve large scalability, indexing structures are usually distributed to incorporate mo...