Interleaved comparison methods, which compare rankers using click data, are a promising alternative to traditional information retrieval evaluation methods that require expensive explicit judgments. A major limitation of these methods is that they assume access to live data, meaning that new data must be collected for every pair of rankers compared. We investigate the use of previously collected click data (i.e., historical data) for interleaved comparisons. We start by analyzing to what degree existing interleaved comparison methods can be applied and find that a recent probabilistic method allows such data reuse, even though it is biased when applied to historical data. We then propose an interleaved comparison method that is based on the...
Traditional retrieval evaluation uses explicit relevance judgments which are expensive to collect. R...
Evaluation in information retrieval takes one of two forms: collection-based offline evaluation, and...
The Cranfield evaluation method has some disadvantages, including its high cost in labor and inadequ...
Interleaved comparison methods, which compare rankers using click data, are a promising alternative ...
Ranker evaluation is central to the research into search engines, be it to compare rankers or to pro...
Ranker evaluation is central to the research into search engines, be it to compare rankers or to pro...
Interleaving is an online evaluation method to compare two alternative ranking functions based on th...
Evaluating rankers using implicit feedback, such as clicks on documents in a result list, is an incr...
Online evaluation methods for information retrieval use implicit signals such as clicks from users t...
Evaluation methods for information retrieval systems come in three types: offline evaluation, using ...
Interleaving is an increasingly popular technique for evaluating information retrieval systems based...
Information retrieval evaluation most often involves manually as-sessing the relevance of particular...
A result page of a modern web search engine is often much more complicated than a simple list of "te...
The gold standard for online retrieval evaluation is AB testing. Rooted in the idea of a controlled ...
A key challenge in information retrieval is that of on-line ranker evaluation: determining which one...
Traditional retrieval evaluation uses explicit relevance judgments which are expensive to collect. R...
Evaluation in information retrieval takes one of two forms: collection-based offline evaluation, and...
The Cranfield evaluation method has some disadvantages, including its high cost in labor and inadequ...
Interleaved comparison methods, which compare rankers using click data, are a promising alternative ...
Ranker evaluation is central to the research into search engines, be it to compare rankers or to pro...
Ranker evaluation is central to the research into search engines, be it to compare rankers or to pro...
Interleaving is an online evaluation method to compare two alternative ranking functions based on th...
Evaluating rankers using implicit feedback, such as clicks on documents in a result list, is an incr...
Online evaluation methods for information retrieval use implicit signals such as clicks from users t...
Evaluation methods for information retrieval systems come in three types: offline evaluation, using ...
Interleaving is an increasingly popular technique for evaluating information retrieval systems based...
Information retrieval evaluation most often involves manually as-sessing the relevance of particular...
A result page of a modern web search engine is often much more complicated than a simple list of "te...
The gold standard for online retrieval evaluation is AB testing. Rooted in the idea of a controlled ...
A key challenge in information retrieval is that of on-line ranker evaluation: determining which one...
Traditional retrieval evaluation uses explicit relevance judgments which are expensive to collect. R...
Evaluation in information retrieval takes one of two forms: collection-based offline evaluation, and...
The Cranfield evaluation method has some disadvantages, including its high cost in labor and inadequ...