Online evaluation methods for information retrieval use implicit signals such as clicks from users to infer preferences between rankers. A highly sensitive way of inferring these preferences is through interleaved comparisons. Recently, interleaved comparisons methods that allow for simultaneous evaluation of more than two rankers have been introduced. These so-called multileaving methods are even more sensitive than their interleaving counterparts. Probabilistic interleaving--whose main selling point is the potential for reuse of historical data--has no multileaving counterpart yet. We propose probabilistic multileave and empirically show that it is highly sensitive and unbiased. An important implication of this result is that historical i...
A key challenge in information retrieval is that of on-line ranker evaluation: determining which one...
A result page of a modern web search engine is often much more complicated than a simple list of "te...
A key challenge in information retrieval is that of on-line ranker evaluation: determining which one...
Evaluation methods for information retrieval systems come in three types: offline evaluation, using ...
Ranker evaluation is central to the research into search engines, be it to compare rankers or to pro...
Ranker evaluation is central to the research into search engines, be it to compare rankers or to pro...
Interleaved comparison methods, which compare rankers using click data, are a promising alternative ...
Evaluating rankers using implicit feedback, such as clicks on documents in a result list, is an incr...
Interleaving is an online evaluation method to compare two alternative ranking functions based on th...
Interleaved comparison methods, which compare rankers using click data, are a promising alternative ...
Modern search systems are based on dozens or even hundreds of ranking features. The dueling bandit g...
Modern search systems are based on dozens or even hundreds of ranking features. The dueling bandit g...
Interleaving is an increasingly popular technique for evaluating information retrieval systems based...
The gold standard for online retrieval evaluation is AB testing. Rooted in the idea of a controlled ...
The amount of digital data we produce every day far surpasses our ability to process this data, and ...
A key challenge in information retrieval is that of on-line ranker evaluation: determining which one...
A result page of a modern web search engine is often much more complicated than a simple list of "te...
A key challenge in information retrieval is that of on-line ranker evaluation: determining which one...
Evaluation methods for information retrieval systems come in three types: offline evaluation, using ...
Ranker evaluation is central to the research into search engines, be it to compare rankers or to pro...
Ranker evaluation is central to the research into search engines, be it to compare rankers or to pro...
Interleaved comparison methods, which compare rankers using click data, are a promising alternative ...
Evaluating rankers using implicit feedback, such as clicks on documents in a result list, is an incr...
Interleaving is an online evaluation method to compare two alternative ranking functions based on th...
Interleaved comparison methods, which compare rankers using click data, are a promising alternative ...
Modern search systems are based on dozens or even hundreds of ranking features. The dueling bandit g...
Modern search systems are based on dozens or even hundreds of ranking features. The dueling bandit g...
Interleaving is an increasingly popular technique for evaluating information retrieval systems based...
The gold standard for online retrieval evaluation is AB testing. Rooted in the idea of a controlled ...
The amount of digital data we produce every day far surpasses our ability to process this data, and ...
A key challenge in information retrieval is that of on-line ranker evaluation: determining which one...
A result page of a modern web search engine is often much more complicated than a simple list of "te...
A key challenge in information retrieval is that of on-line ranker evaluation: determining which one...