This paper describes a study on rater training that involved the analysis of ratings given to English-as-a-Second-Language (ESL) compositions by 8 inexperienced and 8 experienced raters both before and after rater training, using FACETS (Linacre, 1990, 1993), which provides measures of rater severity and consistency. The testing text was a 50-minute composition essay, with 2 prompts, from the ESL Placement Examination (ESLPE) at the University of California, Los Angeles. Compositions were rated using the ESLPE Rating Scale on content, rhetorical control, and language. Each essay was read by two raters, primarily ESL faculty and teaching assistants, and the scores averaged. All raters attended mandatory composition rater training. FACETS pro...
Alderson (2005) suggests that diagnostic tests should identify strengths and weaknesses in learners'...
This dissertation studied the inter-rater reliability of the Oral Language Proficiency Scale used by...
The main objective of this study was to examine whether a Rater Identity Development (RID)program wo...
© 2000 Dr. Thomas James Nathaniel LumleyThe primary purpose of this study is to investigate the proc...
AbstractThe issue of score reliability has always been a contentious one in the testing of language ...
This study investigates the effects of preselected model compositions and a multiple weighted trait ...
The assessment of writing has always been threatened due to raters ’ biasedness. There is evidence t...
This special issue of Language Testing explores raters’ evaluations of L2 proficiency and possible c...
This study investigates the impact of rater severity and the stability of rater severity over time o...
The present research investigates the principles upon which rating scales in oral testing are constr...
Ph.D. University of Hawaii at Manoa 2012.Includes bibliographical references.Speaking performance te...
The psychometric characteristics of the Test of Written English (TWE) rating scale were explored. Ra...
The study investigated the effects of three commonly employed rater training procedures on the ratin...
grantor: University of TorontoI examined the verbal protocols 4 raters of ESL compositions...
Rater training is fundamental in reducing rater variability in self- and peer assessments practice w...
Alderson (2005) suggests that diagnostic tests should identify strengths and weaknesses in learners'...
This dissertation studied the inter-rater reliability of the Oral Language Proficiency Scale used by...
The main objective of this study was to examine whether a Rater Identity Development (RID)program wo...
© 2000 Dr. Thomas James Nathaniel LumleyThe primary purpose of this study is to investigate the proc...
AbstractThe issue of score reliability has always been a contentious one in the testing of language ...
This study investigates the effects of preselected model compositions and a multiple weighted trait ...
The assessment of writing has always been threatened due to raters ’ biasedness. There is evidence t...
This special issue of Language Testing explores raters’ evaluations of L2 proficiency and possible c...
This study investigates the impact of rater severity and the stability of rater severity over time o...
The present research investigates the principles upon which rating scales in oral testing are constr...
Ph.D. University of Hawaii at Manoa 2012.Includes bibliographical references.Speaking performance te...
The psychometric characteristics of the Test of Written English (TWE) rating scale were explored. Ra...
The study investigated the effects of three commonly employed rater training procedures on the ratin...
grantor: University of TorontoI examined the verbal protocols 4 raters of ESL compositions...
Rater training is fundamental in reducing rater variability in self- and peer assessments practice w...
Alderson (2005) suggests that diagnostic tests should identify strengths and weaknesses in learners'...
This dissertation studied the inter-rater reliability of the Oral Language Proficiency Scale used by...
The main objective of this study was to examine whether a Rater Identity Development (RID)program wo...