Search engines and other text retrieval systems use high-performance inverted indexes to provide efficient text query evaluation. Algorithms for fast query evaluation and index construction are well-known, but relatively little has been published concerning update. In this paper, we experimentally evaluate the two main alternative strategies for index maintenance in the presence of insertions, with the constraint that inverted lists remain contiguous on disk for fast query evaluation. The in-place and re-merge strategies are benchmarked against the baseline of a complete re-build. Our experiments with large volumes of web data show that re-merge is the fastest approach if large buffers are available, but that even a simple implementation of...
Full-text information retrieval systems have tradi-tionally been designed for archival environments....
the date of receipt and acceptance should be inserted later Abstract Intense regulatory focus on sec...
Large web search engines process billions of queries each day over tens of billions of documents wit...
Indexes are the key technology underpinning efficient text search. A range of algorithms have been d...
In-place and merge-based index maintenance are the two main competing strategies for on-line index ...
Previous on-line index maintenance strategies are mainly designed for document insertions without co...
Inverted index structures are a core element of current text retrieval systems. They can be construc...
With the proliferation of the world's ``information highways'' a renewed interest in efficient docum...
For text retrieval systems, the assumption that all data structures reside in main memory is increas...
For free-text search over rapidly evolving corpora, dynamic update of inverted indices is a basic re...
Declining disk and CPU costs have kindled a renewed interest in efficient document indexing techniqu...
The original publication is available at www.springerlink.comRecent work on incremental crawling has...
For text retrieval systems, the assumption that all data structures reside in main memory is increas...
All practical text search systems use inverted indexes to quickly resolve user queries. Offline inde...
Efficient construction of inverted indexes is essential to provision of search over large collection...
Full-text information retrieval systems have tradi-tionally been designed for archival environments....
the date of receipt and acceptance should be inserted later Abstract Intense regulatory focus on sec...
Large web search engines process billions of queries each day over tens of billions of documents wit...
Indexes are the key technology underpinning efficient text search. A range of algorithms have been d...
In-place and merge-based index maintenance are the two main competing strategies for on-line index ...
Previous on-line index maintenance strategies are mainly designed for document insertions without co...
Inverted index structures are a core element of current text retrieval systems. They can be construc...
With the proliferation of the world's ``information highways'' a renewed interest in efficient docum...
For text retrieval systems, the assumption that all data structures reside in main memory is increas...
For free-text search over rapidly evolving corpora, dynamic update of inverted indices is a basic re...
Declining disk and CPU costs have kindled a renewed interest in efficient document indexing techniqu...
The original publication is available at www.springerlink.comRecent work on incremental crawling has...
For text retrieval systems, the assumption that all data structures reside in main memory is increas...
All practical text search systems use inverted indexes to quickly resolve user queries. Offline inde...
Efficient construction of inverted indexes is essential to provision of search over large collection...
Full-text information retrieval systems have tradi-tionally been designed for archival environments....
the date of receipt and acceptance should be inserted later Abstract Intense regulatory focus on sec...
Large web search engines process billions of queries each day over tens of billions of documents wit...