Abstract. We consider the problem of acquiring relevance judgements for in-formation retrieval (IR) test collections through crowdsourcing when no true relevance labels are available. We collect multiple, possibly noisy relevance la-bels per document from workers of unknown labelling accuracy. We use these labels to infer the document relevance based on two methods. The first method is the commonly used majority voting (MV) which determines the document relevance based on the label that received the most votes, treating all the work-ers equally. The second is a probabilistic model that concurrently estimates the document relevance and the workers accuracy using expectation maximization (EM). We run simulations and conduct experiments with c...
Evaluation is instrumental in the development and management of effective information retrieval syst...
We consider the problem of optimally allocating a limited budget to acquire relevance judgments when...
Information Retrieval systems rely on large test collections to measure their effectiveness in retri...
The performance of information retrieval (IR) systems is commonly evaluated using a test set with kn...
In Information Retrieval (IR) evaluation, preference judgments are collected by presenting to the as...
htmlabstractThe performance of information retrieval (IR) systems is commonly evaluated using a test...
Crowdsourcing is a popular technique to collect large amounts of human-generated labels, such as rel...
Crowdsourcing is a popular technique to collect large amounts of human-generated labels, such as rel...
Crowdsourcing is a popular technique to collect large amounts of human-generated labels, such as rel...
Crowdsourcing has become an alternative approach to collect relevance judgments at scale thanks to t...
Crowdsourcing relevance judgments for the evaluation of search engines is used increasingly to overc...
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the ...
Crowdsourcing has become an alternative approach to collect relevance judgments at scale thanks to t...
Abstract. Pooling is a document sampling strategy commonly used to collect relevance judgments when ...
The availability of test collections in Cranfield paradigm has significantly benefited the developme...
Evaluation is instrumental in the development and management of effective information retrieval syst...
We consider the problem of optimally allocating a limited budget to acquire relevance judgments when...
Information Retrieval systems rely on large test collections to measure their effectiveness in retri...
The performance of information retrieval (IR) systems is commonly evaluated using a test set with kn...
In Information Retrieval (IR) evaluation, preference judgments are collected by presenting to the as...
htmlabstractThe performance of information retrieval (IR) systems is commonly evaluated using a test...
Crowdsourcing is a popular technique to collect large amounts of human-generated labels, such as rel...
Crowdsourcing is a popular technique to collect large amounts of human-generated labels, such as rel...
Crowdsourcing is a popular technique to collect large amounts of human-generated labels, such as rel...
Crowdsourcing has become an alternative approach to collect relevance judgments at scale thanks to t...
Crowdsourcing relevance judgments for the evaluation of search engines is used increasingly to overc...
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the ...
Crowdsourcing has become an alternative approach to collect relevance judgments at scale thanks to t...
Abstract. Pooling is a document sampling strategy commonly used to collect relevance judgments when ...
The availability of test collections in Cranfield paradigm has significantly benefited the developme...
Evaluation is instrumental in the development and management of effective information retrieval syst...
We consider the problem of optimally allocating a limited budget to acquire relevance judgments when...
Information Retrieval systems rely on large test collections to measure their effectiveness in retri...