Ranker evaluation is central to the research into search engines, be it to compare rankers or to provide feedback for learning to rank. Traditional evaluation approaches do not scale well because they require explicit relevance judgments of document-query pairs, which are expensive to obtain. A promising alternative is the use of interleaved comparison methods, which compare rankers using click data obtained when interleaving their rankings. We propose a framework for analyzing interleaved comparison methods. An interleaved comparison method has fidelity if the expected outcome of ranker comparisons properly corresponds to the true relevance of the ranked documents. It is sound if its estimates of that expected outcome are unbiased and cons...
A key challenge in information retrieval is that of on-line ranker evaluation: determining which one...
The amount of digital data we produce every day far surpasses our ability to process this data, and ...
The gold standard for online retrieval evaluation is AB testing. Rooted in the idea of a controlled ...
Ranker evaluation is central to the research into search engines, be it to compare rankers or to pro...
Ranker evaluation is central to the research into search engines, be it to compare rankers or to pro...
Evaluating rankers using implicit feedback, such as clicks on documents in a result list, is an incr...
Evaluation methods for information retrieval systems come in three types: offline evaluation, using ...
Interleaved comparison methods, which compare rankers using click data, are a promising alternative ...
Interleaving is an online evaluation method to compare two alternative ranking functions based on th...
A result page of a modern web search engine is often much more complicated than a simple list of "te...
Online evaluation methods for information retrieval use implicit signals such as clicks from users t...
Interleaved comparison methods, which compare rankers using click data, are a promising alternative ...
A result page of a modern search engine often goes beyond a simple list of "10 blue links." Many spe...
Interleaving is an increasingly popular technique for evaluating information retrieval systems based...
A result page of a modern web search engine is often much more complicated than a simple list of “te...
A key challenge in information retrieval is that of on-line ranker evaluation: determining which one...
The amount of digital data we produce every day far surpasses our ability to process this data, and ...
The gold standard for online retrieval evaluation is AB testing. Rooted in the idea of a controlled ...
Ranker evaluation is central to the research into search engines, be it to compare rankers or to pro...
Ranker evaluation is central to the research into search engines, be it to compare rankers or to pro...
Evaluating rankers using implicit feedback, such as clicks on documents in a result list, is an incr...
Evaluation methods for information retrieval systems come in three types: offline evaluation, using ...
Interleaved comparison methods, which compare rankers using click data, are a promising alternative ...
Interleaving is an online evaluation method to compare two alternative ranking functions based on th...
A result page of a modern web search engine is often much more complicated than a simple list of "te...
Online evaluation methods for information retrieval use implicit signals such as clicks from users t...
Interleaved comparison methods, which compare rankers using click data, are a promising alternative ...
A result page of a modern search engine often goes beyond a simple list of "10 blue links." Many spe...
Interleaving is an increasingly popular technique for evaluating information retrieval systems based...
A result page of a modern web search engine is often much more complicated than a simple list of “te...
A key challenge in information retrieval is that of on-line ranker evaluation: determining which one...
The amount of digital data we produce every day far surpasses our ability to process this data, and ...
The gold standard for online retrieval evaluation is AB testing. Rooted in the idea of a controlled ...