The relative performance of retrieval systems when evaluated on one part of a test collection may bear little or no similarity to the relative performance measured on a different part of the collection. In this paper we report the results of a detailed study of the impact that different sub-collections have on retrieval effectiveness, analyzing the effect over many collections, and with different approaches to sub-dividing the collections. The effect is shown to be substantial, impacting on comparisons between retrieval runs that are statistically significant. Some possible causes for the effect are investigated, and the implications of this work are examined for test collection design and for the strength of conclusions one can draw from e...
The retrieval effectiveness of large document collections is normally assessed by using small subsec...
Evaluating retrieval systems in a controlled environment with a large set of topics has been the cor...
Many IR effectiveness measures are motivated from intuition, theory, or user studies. In general, mo...
<p>The relative performance of retrieval systems when evaluated on one part of a test collecti...
Past work showed that significant inconsistencies between retrieval results occurred on different te...
The purpose of this article is to bring attention to the prob-lem of variations in relevance assessm...
Test collections are extensively used in the evaluation of information retrieval systems. Crucial to...
Introduction. Evaluation is highly important for designing, developing and maintaining effective inf...
Test collection design eliminates sources of user variability to make statistical comparisons among ...
© 2010 Dr. William Edward WebberFull-text retrieval systems employ heuristics to match documents to ...
© 2011 Dr. Sri Devi RavanaComparative evaluations of information retrieval systems using test collec...
Despite the bulk of research studying how to more accurately compare the performance of IR systems, ...
Use of test collections and evaluation measures to assess the effectiveness of information retrieval...
We explore the implications of using query variations for evaluating information retrieval systems a...
Several recent studies have explored the interaction effects between topics, systems, corpora, and c...
The retrieval effectiveness of large document collections is normally assessed by using small subsec...
Evaluating retrieval systems in a controlled environment with a large set of topics has been the cor...
Many IR effectiveness measures are motivated from intuition, theory, or user studies. In general, mo...
<p>The relative performance of retrieval systems when evaluated on one part of a test collecti...
Past work showed that significant inconsistencies between retrieval results occurred on different te...
The purpose of this article is to bring attention to the prob-lem of variations in relevance assessm...
Test collections are extensively used in the evaluation of information retrieval systems. Crucial to...
Introduction. Evaluation is highly important for designing, developing and maintaining effective inf...
Test collection design eliminates sources of user variability to make statistical comparisons among ...
© 2010 Dr. William Edward WebberFull-text retrieval systems employ heuristics to match documents to ...
© 2011 Dr. Sri Devi RavanaComparative evaluations of information retrieval systems using test collec...
Despite the bulk of research studying how to more accurately compare the performance of IR systems, ...
Use of test collections and evaluation measures to assess the effectiveness of information retrieval...
We explore the implications of using query variations for evaluating information retrieval systems a...
Several recent studies have explored the interaction effects between topics, systems, corpora, and c...
The retrieval effectiveness of large document collections is normally assessed by using small subsec...
Evaluating retrieval systems in a controlled environment with a large set of topics has been the cor...
Many IR effectiveness measures are motivated from intuition, theory, or user studies. In general, mo...