Due to the skewed nature of the frequency distribution of term occurrence (e.g., Zipf's law) it is unlikely that any single technique for indexing text can do well in all situations. In this paper we propose a hybrid approach to indexing text, and show how it can outperform the traditional inverted B-tree index both in storage overhead, in time to perform a retrieval, and, for dynamic databases, in time for an insertion, both for single term and for multiple term queries. We demonstrate the benefits of our technique on a database of stories from the Associated Press news wire, and we provide formulae and guidelines on how to make optimal choices of the design parameters in real applications
Text search engines return a set of k documents ranked by similarity to a query. Typically, document...
: An inverted index stores, for each term that appears in a collection of documents, a list of docum...
The issue of reducing the space overhead when indexing large text databases is becoming more and mor...
Due to the skewed nature of the frequency distribution of term occurrence (e.g., Zipf’s law) it is u...
Efficient construction of inverted indexes is essential to provision of search over large collection...
The technology underlying text search engines has advanced dramatically in the past decade. The deve...
Inverted index structures are a core element of current text retrieval systems. They can be construc...
For free-text search over rapidly evolving corpora, dynamic update of inverted indices is a basic re...
Retrieval effectiveness depends on how terms are extracted and indexed. For Chinese text (and others...
Abstract: Full-text database systems require an in-dex to allow fast access to documents based on th...
In-place and merge-based index maintenance are the two main competing strategies for on-line index ...
Query processing with precomputed term pair lists can improve efficiency for some queries, but suff...
Intersecting inverted indexes is a fundamental operation for many applications in information retrie...
This thesis describes the development and setup of hybrid index structures. They are access methods ...
The inverted index supports efficient full-text searches on natural language text collections. It re...
Text search engines return a set of k documents ranked by similarity to a query. Typically, document...
: An inverted index stores, for each term that appears in a collection of documents, a list of docum...
The issue of reducing the space overhead when indexing large text databases is becoming more and mor...
Due to the skewed nature of the frequency distribution of term occurrence (e.g., Zipf’s law) it is u...
Efficient construction of inverted indexes is essential to provision of search over large collection...
The technology underlying text search engines has advanced dramatically in the past decade. The deve...
Inverted index structures are a core element of current text retrieval systems. They can be construc...
For free-text search over rapidly evolving corpora, dynamic update of inverted indices is a basic re...
Retrieval effectiveness depends on how terms are extracted and indexed. For Chinese text (and others...
Abstract: Full-text database systems require an in-dex to allow fast access to documents based on th...
In-place and merge-based index maintenance are the two main competing strategies for on-line index ...
Query processing with precomputed term pair lists can improve efficiency for some queries, but suff...
Intersecting inverted indexes is a fundamental operation for many applications in information retrie...
This thesis describes the development and setup of hybrid index structures. They are access methods ...
The inverted index supports efficient full-text searches on natural language text collections. It re...
Text search engines return a set of k documents ranked by similarity to a query. Typically, document...
: An inverted index stores, for each term that appears in a collection of documents, a list of docum...
The issue of reducing the space overhead when indexing large text databases is becoming more and mor...