We present a static index pruning method, to be used in ad-hoc document retrieval tasks, that follows a documentcentric approach to decide whether a posting for a given term should remain in the index or not. The decision is made based on the term's contribution to the document's Kullback-Leibler divergence from the text collection's global language model. Our technique can be used to decrease the size of the index by over 90%, at only a minor decrease in retrieval e#ectiveness. It thus allows us to make the index small enough to fit entirely into the main memory of a single PC, even for large text collections containing millions of documents. This results in great e#ciency gains, superior to those of earlier pruning methods,...
Abstract—This paper introduces a new weighting scheme in information retrieval. It also proposes usi...
This thesis primarily investigates lossy compression of an inverted index. Two approaches of lossy ...
An element-index is a crucial mechanism for supporting content-only (CO) queries over XML collection...
We compared the term- and document-centric static index pruning approaches as described in the liter...
Carterette, BenStatic index pruning methods have been proposed to reduce the index size of informati...
We compare the term- and document-centric static index pruning approaches as described in the litera...
Abstract. Document-centric static index pruning methods provide smaller indexes and faster query tim...
Static index pruning techniques aim at removing from the posting lists of an inverted file the refer...
Cataloged from PDF version of article.Static index pruning techniques permanently remove a presumabl...
Static index pruning techniques permanently remove a presumably redundant part of an inverted file, ...
针对网页质量参差不齐、重要程度差别巨大的问题,提出了按照网页重要程度确定其剪枝幅度的静态索引剪枝方法,并在GOV2数据集上进行了验证.实验结果表明:这种方法体现了静态索引剪枝能极大降低存储需求、提高查...
This paper discusses a novel approach developed for static index pruning that takes into account the...
We propose incorporating query views in a number of static pruning strategies, namely term-centric, ...
In this chapter we describe a set of index structures that are suitable for supporting search querie...
The presence of spam in a document ranking is a major issue for Web search engines. Common approache...
Abstract—This paper introduces a new weighting scheme in information retrieval. It also proposes usi...
This thesis primarily investigates lossy compression of an inverted index. Two approaches of lossy ...
An element-index is a crucial mechanism for supporting content-only (CO) queries over XML collection...
We compared the term- and document-centric static index pruning approaches as described in the liter...
Carterette, BenStatic index pruning methods have been proposed to reduce the index size of informati...
We compare the term- and document-centric static index pruning approaches as described in the litera...
Abstract. Document-centric static index pruning methods provide smaller indexes and faster query tim...
Static index pruning techniques aim at removing from the posting lists of an inverted file the refer...
Cataloged from PDF version of article.Static index pruning techniques permanently remove a presumabl...
Static index pruning techniques permanently remove a presumably redundant part of an inverted file, ...
针对网页质量参差不齐、重要程度差别巨大的问题,提出了按照网页重要程度确定其剪枝幅度的静态索引剪枝方法,并在GOV2数据集上进行了验证.实验结果表明:这种方法体现了静态索引剪枝能极大降低存储需求、提高查...
This paper discusses a novel approach developed for static index pruning that takes into account the...
We propose incorporating query views in a number of static pruning strategies, namely term-centric, ...
In this chapter we describe a set of index structures that are suitable for supporting search querie...
The presence of spam in a document ranking is a major issue for Web search engines. Common approache...
Abstract—This paper introduces a new weighting scheme in information retrieval. It also proposes usi...
This thesis primarily investigates lossy compression of an inverted index. Two approaches of lossy ...
An element-index is a crucial mechanism for supporting content-only (CO) queries over XML collection...