Evaluation is instrumental in the development and management of effective information retrieval systems and ensuring high levels of user satisfaction. Using crowdsourcing to obtain relevance assessments has been shown to be viable through a number of publications. What is less well understood are the limits of crowdsourcing for the assessment task, particularly for domain specific search. We present results comparing relevance assessments gathered using crowdsourcing with those gathered from a domain expert for evaluating different search engines in a large government archive. While crowdsourced judgments rank the tested search engines in the same order as expert judgments, crowdsourced workers appear unable to distinguish different levels ...
Crowdsourcing has gained a lot of attention as a viable approach for conducting IR evaluations. This...
Crowdsourcing has become an alternative approach to collect relevance judgments at large scale. In t...
Information Retrieval (IR) researchers have often used existing IR evaluation collections and transf...
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the ...
The performance of information retrieval (IR) systems is commonly evaluated using a test set with kn...
htmlabstractThe performance of information retrieval (IR) systems is commonly evaluated using a test...
Crowdsourcing relevance judgments for the evaluation of search engines is used increasingly to overc...
Crowdsourcing has become an alternative approach to collect relevance judgments at scale thanks to t...
Crowdsourcing has become an alternative approach to collect relevance judgments at scale thanks to t...
Test collection is extensively used to evaluate information retrieval systems in laboratory-based ev...
The primary problem confronting any new kind of search task is how to boot-strap a reliable and repe...
Information Retrieval systems rely on large test collections to measure their effectiveness in retri...
Abstract. We consider the problem of acquiring relevance judgements for in-formation retrieval (IR) ...
Information Retrieval systems rely on large test collections to measure their effectiveness in retri...
Information Retrieval systems rely on large test collections to measure their effectiveness in retri...
Crowdsourcing has gained a lot of attention as a viable approach for conducting IR evaluations. This...
Crowdsourcing has become an alternative approach to collect relevance judgments at large scale. In t...
Information Retrieval (IR) researchers have often used existing IR evaluation collections and transf...
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the ...
The performance of information retrieval (IR) systems is commonly evaluated using a test set with kn...
htmlabstractThe performance of information retrieval (IR) systems is commonly evaluated using a test...
Crowdsourcing relevance judgments for the evaluation of search engines is used increasingly to overc...
Crowdsourcing has become an alternative approach to collect relevance judgments at scale thanks to t...
Crowdsourcing has become an alternative approach to collect relevance judgments at scale thanks to t...
Test collection is extensively used to evaluate information retrieval systems in laboratory-based ev...
The primary problem confronting any new kind of search task is how to boot-strap a reliable and repe...
Information Retrieval systems rely on large test collections to measure their effectiveness in retri...
Abstract. We consider the problem of acquiring relevance judgements for in-formation retrieval (IR) ...
Information Retrieval systems rely on large test collections to measure their effectiveness in retri...
Information Retrieval systems rely on large test collections to measure their effectiveness in retri...
Crowdsourcing has gained a lot of attention as a viable approach for conducting IR evaluations. This...
Crowdsourcing has become an alternative approach to collect relevance judgments at large scale. In t...
Information Retrieval (IR) researchers have often used existing IR evaluation collections and transf...