Abstract — We present a novel strategy to partition a document collection onto several servers and to perform effective collection selection. The method is based on the analysis of query logs. We proposed a novel document representation called query-vectors model. Each document is represented as a list recording the queries for which the document itself is a match, along with their ranks. To both partition the collection and build the collection selection function, we co-cluster queries and documents. The document clusters are then assigned to the underlying IR servers, while the query clusters represent queries that return similar results, and are used for collection selection. We show that this document partition strategy greatly boosts t...
Clustering of related or similar objects has long been regarded as a potentially useful contribution...
For a storage system to keep pace with increasing amounts of data, a natural solution is to deploy m...
In information retrieval systems, there are three types of index partitioning schemes - term partiti...
In this paper, we introduce a new collection selection strategy to be operated in search engines wit...
In a distributed document database system, a query is processed by passing it to a set of individual...
In a distributed document database system, a query is processed by passing it to a set of individual...
http://www.emse.fr/~mbeig/PUBLIS/2002-itcc-p529-abbaci.ps.gzInternational audienceIn this paper we d...
In this thesis, we present a distributed architecture for a Web search engine, based on the concept ...
In an environment of distributed text collections, the first step in the information retrieval proce...
The vast amount of scientific literature poses a challenge when one is trying to understand a previo...
Two principal query-evaluation methodologies have been described for cluster-based implementation of...
Nowadays, the dissemination of information touches the distributed world, where selecting the releva...
Abstract. An efficient way to explore a large document collection (e.g., the search results returned...
Clustering of related or similar objects has long been regarded as a potentially useful contribution...
We present an efficient document clustering algorithm that uses a term frequency vector for each doc...
Clustering of related or similar objects has long been regarded as a potentially useful contribution...
For a storage system to keep pace with increasing amounts of data, a natural solution is to deploy m...
In information retrieval systems, there are three types of index partitioning schemes - term partiti...
In this paper, we introduce a new collection selection strategy to be operated in search engines wit...
In a distributed document database system, a query is processed by passing it to a set of individual...
In a distributed document database system, a query is processed by passing it to a set of individual...
http://www.emse.fr/~mbeig/PUBLIS/2002-itcc-p529-abbaci.ps.gzInternational audienceIn this paper we d...
In this thesis, we present a distributed architecture for a Web search engine, based on the concept ...
In an environment of distributed text collections, the first step in the information retrieval proce...
The vast amount of scientific literature poses a challenge when one is trying to understand a previo...
Two principal query-evaluation methodologies have been described for cluster-based implementation of...
Nowadays, the dissemination of information touches the distributed world, where selecting the releva...
Abstract. An efficient way to explore a large document collection (e.g., the search results returned...
Clustering of related or similar objects has long been regarded as a potentially useful contribution...
We present an efficient document clustering algorithm that uses a term frequency vector for each doc...
Clustering of related or similar objects has long been regarded as a potentially useful contribution...
For a storage system to keep pace with increasing amounts of data, a natural solution is to deploy m...
In information retrieval systems, there are three types of index partitioning schemes - term partiti...