This paper addresses the problem of balanced, redundant indexing of media information. Our goal is to partition and distribute the search index, taking advantage of the distributed systems properties: balanced load across nodes, redundancy on node down and efficient node usage under concurrent querying. We follow an information compression approach to solve this problem and propose to represent data with overcomplete codebooks, where each document is represented by only a few codewords and an indexing node is responsible for several codewords. Quantization algorithms are designed to fit the original data as best as possible, leading to bias towards codewords that fit the principal directions of data. In this paper, we propose the balanced K...
This thesis studies problems related to compressed full-text indexes. A full-text index is a data st...
Indexing the Web and meeting the throughput, response-time, and failure-resilience requirements of a...
Abstract. Compressed full-text indexes have been one of pattern matching’s most important success st...
Effective partitioning multimedia indexes is key for efficient kNN search. But existing algorithms a...
The creation of very large-scale multimedia search engines, with more than one billion images and v...
DHT systems are structured overlay networks capable of using P2P resources as a scalable platform fo...
In recent years, there is an ever-increasing research focus on Bag-of-Words based near duplicate vis...
Careful architectural decisions are required in order to create a highly available and scalable sear...
One of the largest problems associated with content-based indexing of multi-media documents is the ...
AbstractWe give new space/time tradeoffs for compressed indexes that answer document retrieval queri...
Current persistent search systems (e.g. Google Alerts, CNN Alerts) are built on traditional search e...
Large-scale web and text retrieval systems deal with amounts of data that greatly exceed the capacit...
In a shared-nothing, distributed text retrieval system, queries are processed over an inverted index...
International audienceDue to the dramatically increasing amount of multimedia contents in several ap...
: An inverted index stores, for each term that appears in a collection of documents, a list of docum...
This thesis studies problems related to compressed full-text indexes. A full-text index is a data st...
Indexing the Web and meeting the throughput, response-time, and failure-resilience requirements of a...
Abstract. Compressed full-text indexes have been one of pattern matching’s most important success st...
Effective partitioning multimedia indexes is key for efficient kNN search. But existing algorithms a...
The creation of very large-scale multimedia search engines, with more than one billion images and v...
DHT systems are structured overlay networks capable of using P2P resources as a scalable platform fo...
In recent years, there is an ever-increasing research focus on Bag-of-Words based near duplicate vis...
Careful architectural decisions are required in order to create a highly available and scalable sear...
One of the largest problems associated with content-based indexing of multi-media documents is the ...
AbstractWe give new space/time tradeoffs for compressed indexes that answer document retrieval queri...
Current persistent search systems (e.g. Google Alerts, CNN Alerts) are built on traditional search e...
Large-scale web and text retrieval systems deal with amounts of data that greatly exceed the capacit...
In a shared-nothing, distributed text retrieval system, queries are processed over an inverted index...
International audienceDue to the dramatically increasing amount of multimedia contents in several ap...
: An inverted index stores, for each term that appears in a collection of documents, a list of docum...
This thesis studies problems related to compressed full-text indexes. A full-text index is a data st...
Indexing the Web and meeting the throughput, response-time, and failure-resilience requirements of a...
Abstract. Compressed full-text indexes have been one of pattern matching’s most important success st...