Large-scale web and text retrieval systems deal with amounts of data that greatly exceed the capacity of any single machine. To handle the necessary data volumes and query throughput rates, parallel systems are used, in which the document and index data are split across tightly-clustered distributed computing systems. The index data can be distributed either by document or by term. In this paper we examine methods for load balancing in term-distributed parallel architectures, and propose a suite of techniques for reducing net querying costs. In combination, the techniques we describe allow a 30% improvement in query throughput when tested on an eight-node parallel computer system
This paper presents a case study of parallel retneval of abstracts for an SDI service on a local are...
A consensus on parallel architecture for very large database management has emerged. This architectu...
We present a general method of parallel query processing that allows scalable performance on distrib...
Information retrieval systems often have to deal with very large amounts of data. They must be able ...
The problem of eciently retrieving and ranking documents from a huge collection according to their r...
Dynamic load balancing is a prerequisite for effectively utilizing large parallel database systems. ...
Two principal query-evaluation methodologies have been described for cluster-based implementation of...
In information retrieval systems, there are three types of index partitioning schemes - term partiti...
Simulation and analysis have shown that selective search can reduce the cost of large-scale distribu...
In a shared-nothing, distributed text retrieval system, queries are processed over an inverted index...
We consider the execution of multi-join queries in a hierarchical parallel system, i.e., a shared-no...
The proliferation of the world's \information highways " has renewed interest in e cie...
Clusters are now composed of non-uniform nodes with different CPUs, disks or network cards so that c...
As information explodes across the Internet and intranets, information retrieval (IR) systems must c...
International audienceDefinition : The goal of parallel query execution is minimizing query response...
This paper presents a case study of parallel retneval of abstracts for an SDI service on a local are...
A consensus on parallel architecture for very large database management has emerged. This architectu...
We present a general method of parallel query processing that allows scalable performance on distrib...
Information retrieval systems often have to deal with very large amounts of data. They must be able ...
The problem of eciently retrieving and ranking documents from a huge collection according to their r...
Dynamic load balancing is a prerequisite for effectively utilizing large parallel database systems. ...
Two principal query-evaluation methodologies have been described for cluster-based implementation of...
In information retrieval systems, there are three types of index partitioning schemes - term partiti...
Simulation and analysis have shown that selective search can reduce the cost of large-scale distribu...
In a shared-nothing, distributed text retrieval system, queries are processed over an inverted index...
We consider the execution of multi-join queries in a hierarchical parallel system, i.e., a shared-no...
The proliferation of the world's \information highways " has renewed interest in e cie...
Clusters are now composed of non-uniform nodes with different CPUs, disks or network cards so that c...
As information explodes across the Internet and intranets, information retrieval (IR) systems must c...
International audienceDefinition : The goal of parallel query execution is minimizing query response...
This paper presents a case study of parallel retneval of abstracts for an SDI service on a local are...
A consensus on parallel architecture for very large database management has emerged. This architectu...
We present a general method of parallel query processing that allows scalable performance on distrib...