We propose a novel approach to modelling rater effects in scoring-based assessment. The approach is based on a Bayesian hierarchical model and simulations from the posterior distribution. We apply it to large-scale essay assessment data over a period of 5 years. Empirical results suggest that the model provides a good fit for both the total scores and when applied to individual rubrics. We estimate the median impact of rater effects on the final grade to be ± 2 points on a 50 point scale, while 10% of essays would receive a score at least ± 5 different from their actual quality. Most of the impact is due to rater unreliability, not rater bias
The paper presents a study of the performance variationsoftheBayesianmodelofpeerassessmentimplement...
How common patterns of rater errors may be detected in a large-scale performance assessment setting ...
The current study as a doctorate dissertation investigates the gap between the nature of ESL perfor...
<div><p>We propose a novel approach to modelling rater effects in scoring-based assessment. The appr...
We develop a Bayesian hierarchical model for the analysis of ordinal data from multirater ranking st...
This study describes how latent trait models, specifically the multi-faceted Rasch model, may be app...
In educational measurement, various methods have been proposed to infer student proficiency from the...
The use of constructed response (CR) items or performance tasks to assess test takers ’ ability has ...
This dissertation is comprised of three papers that propose and apply psychometric models to deal wi...
An approach to essay grading based on signal detection theory (SDT) is presented. SDT offers a basis...
Rater effects are of concern when different raters score candidates' responses. This study demonstra...
© 2019 by the National Council on Measurement in Education Researchers have explored a variety of to...
In educational and psychological studies, psychometric methods are involved in the measurement of co...
Following the debate between the proponents of the ranking and rating methods for the measurement of...
To find population proficiency distributions, a two-level hierarchical linear model may be applied t...
The paper presents a study of the performance variationsoftheBayesianmodelofpeerassessmentimplement...
How common patterns of rater errors may be detected in a large-scale performance assessment setting ...
The current study as a doctorate dissertation investigates the gap between the nature of ESL perfor...
<div><p>We propose a novel approach to modelling rater effects in scoring-based assessment. The appr...
We develop a Bayesian hierarchical model for the analysis of ordinal data from multirater ranking st...
This study describes how latent trait models, specifically the multi-faceted Rasch model, may be app...
In educational measurement, various methods have been proposed to infer student proficiency from the...
The use of constructed response (CR) items or performance tasks to assess test takers ’ ability has ...
This dissertation is comprised of three papers that propose and apply psychometric models to deal wi...
An approach to essay grading based on signal detection theory (SDT) is presented. SDT offers a basis...
Rater effects are of concern when different raters score candidates' responses. This study demonstra...
© 2019 by the National Council on Measurement in Education Researchers have explored a variety of to...
In educational and psychological studies, psychometric methods are involved in the measurement of co...
Following the debate between the proponents of the ranking and rating methods for the measurement of...
To find population proficiency distributions, a two-level hierarchical linear model may be applied t...
The paper presents a study of the performance variationsoftheBayesianmodelofpeerassessmentimplement...
How common patterns of rater errors may be detected in a large-scale performance assessment setting ...
The current study as a doctorate dissertation investigates the gap between the nature of ESL perfor...