Rater effects are of concern when different raters score candidates' responses. This study demonstrates how models can be used to evaluate the scores assigned by raters on a freeresponse English essay question from a high stakes Examinations Board in the Caribbean Region. In addition, it seeks to use these models to assess the validity of the grades based on these scores. A new Classical test theory (CIT) model was created and compared with the many-faceted Rasch measurement (MFRM) models of rater severity in this investigation, comparing the effects of modelling individual raters and Table group severity, and the additional influence of the Table Leaders (considered by policy as 'standard bearers'). Models for the whole marking period were...
This study examined the rater severity of instructors using a multi-trait rubric in a freshman comp...
Every year outcomes from public examinations in the UK rise: politicians congratulate pupils on thei...
This study aims at comparing five rating behaviors of 8 raters; four novice raters and four experien...
This study investigates the impact of rater severity and the stability of rater severity over time o...
The study aims to investigate the extent to which raters exhibit tendencies towards being overly sev...
© 2011 Dr. Negar KeshavarzMehrIn assessing oral language proficiency, the scores given to interviewe...
An approach to essay grading based on signal detection theory (SDT) is presented. SDT offers a basis...
Scoring language learners’ writing exams is a difficult task for graders since many task-relevant or...
Essay scoring operates both in the classroom and in high-stakes testing and the results of essay sco...
The purpose of this study is to investigate the reliability of essay test scores when the scores are...
The current study as a doctorate dissertation investigates the gap between the nature of ESL perfor...
The psychometric characteristics of the Test of Written English (TWE) rating scale were explored. Ra...
This study investigates the effects of preselected model compositions and a multiple weighted trait ...
© 2000 Dr. Thomas James Nathaniel LumleyThe primary purpose of this study is to investigate the proc...
We investigated the relationship between the scores assigned by an Automated Essay Scoring (AES) sys...
This study examined the rater severity of instructors using a multi-trait rubric in a freshman comp...
Every year outcomes from public examinations in the UK rise: politicians congratulate pupils on thei...
This study aims at comparing five rating behaviors of 8 raters; four novice raters and four experien...
This study investigates the impact of rater severity and the stability of rater severity over time o...
The study aims to investigate the extent to which raters exhibit tendencies towards being overly sev...
© 2011 Dr. Negar KeshavarzMehrIn assessing oral language proficiency, the scores given to interviewe...
An approach to essay grading based on signal detection theory (SDT) is presented. SDT offers a basis...
Scoring language learners’ writing exams is a difficult task for graders since many task-relevant or...
Essay scoring operates both in the classroom and in high-stakes testing and the results of essay sco...
The purpose of this study is to investigate the reliability of essay test scores when the scores are...
The current study as a doctorate dissertation investigates the gap between the nature of ESL perfor...
The psychometric characteristics of the Test of Written English (TWE) rating scale were explored. Ra...
This study investigates the effects of preselected model compositions and a multiple weighted trait ...
© 2000 Dr. Thomas James Nathaniel LumleyThe primary purpose of this study is to investigate the proc...
We investigated the relationship between the scores assigned by an Automated Essay Scoring (AES) sys...
This study examined the rater severity of instructors using a multi-trait rubric in a freshman comp...
Every year outcomes from public examinations in the UK rise: politicians congratulate pupils on thei...
This study aims at comparing five rating behaviors of 8 raters; four novice raters and four experien...