Abstract. Crowdsourcing relevance judgments for test collection con-struction is attractive because the practice has the possibility of being more affordable than hiring high quality assessors. A problem faced by all crowdsourced judgments – even judgments formed from the consensus of multiple workers – is that there will be differences in the judgments compared to the judgments produced by high quality assessors. For two TREC test collections, we simulated errors in sets of judgments and then measured the effect of these errors on effectiveness measures. We found that some measures appear to be more tolerant of errors than others. We also found that to achieve high rank correlation in the ranking of retrieval systems requires conservative ...
Relevance judgment of human assessors is inherently subjective and dynamic when evaluation datasets ...
This paper investigates the agreement of relevance assessments between official TREC judgments and t...
In Information Retrieval evaluation, the classical approach of adopting binary relevance judgments h...
In recent years, gathering relevance judgments through non-topic originators has become an increasin...
Crowdsourcing has become an alternative approach to collect relevance judgments at scale thanks to t...
Test collection is extensively used to evaluate information retrieval systems in laboratory-based ev...
Abstract. We present two new measures of retrieval effectiveness, in-spired by Graded Average Precis...
Crowdsourcing has become an alternative approach to collect relevance judgments at scale thanks to t...
The agreement between relevance assessors is an important but understudied topic in the Information ...
© 2018 ACM. While crowdsourcing offers a low-cost, scalable way to collect relevance judgments, lack...
The agreement between relevance assessors is an important but understudied topic in the Information ...
© 2019 Ziying YangBatch evaluation techniques are often used to measure and compare the performance ...
Magnitude estimation is a psychophysical scaling technique for the measurement of sensation, where o...
Information Retrieval (IR) researchers have often used existing IR evaluation collections and transf...
Relevance assessments are a key component for test collectionbased evaluation of information retriev...
Relevance judgment of human assessors is inherently subjective and dynamic when evaluation datasets ...
This paper investigates the agreement of relevance assessments between official TREC judgments and t...
In Information Retrieval evaluation, the classical approach of adopting binary relevance judgments h...
In recent years, gathering relevance judgments through non-topic originators has become an increasin...
Crowdsourcing has become an alternative approach to collect relevance judgments at scale thanks to t...
Test collection is extensively used to evaluate information retrieval systems in laboratory-based ev...
Abstract. We present two new measures of retrieval effectiveness, in-spired by Graded Average Precis...
Crowdsourcing has become an alternative approach to collect relevance judgments at scale thanks to t...
The agreement between relevance assessors is an important but understudied topic in the Information ...
© 2018 ACM. While crowdsourcing offers a low-cost, scalable way to collect relevance judgments, lack...
The agreement between relevance assessors is an important but understudied topic in the Information ...
© 2019 Ziying YangBatch evaluation techniques are often used to measure and compare the performance ...
Magnitude estimation is a psychophysical scaling technique for the measurement of sensation, where o...
Information Retrieval (IR) researchers have often used existing IR evaluation collections and transf...
Relevance assessments are a key component for test collectionbased evaluation of information retriev...
Relevance judgment of human assessors is inherently subjective and dynamic when evaluation datasets ...
This paper investigates the agreement of relevance assessments between official TREC judgments and t...
In Information Retrieval evaluation, the classical approach of adopting binary relevance judgments h...