This study investigated to what extent two teams of experienced raters from different European countries (Finland and Austria), using their own CEFR-based rating scale (one holistic and one analytic), agreed on the CEFR level of students’ writing performances. Both teams rated one hundred performances written by Austrian secondary school students based on two tasks. The Finnish raters (N = 3) applied a holistic CEFR-linked rating scale consisting of verbatim CEFR descriptors developed in Finland, while the Austrian team (N = 6) used an analytic CEFR-linked rating scale consisting of four criteria developed in Austria. The ratings were analysed using the Rasch model. Although there were individual differences in rater severity among both ...
The study aims to investigate the extent to which raters exhibit tendencies towards being overly sev...
© 2000 Dr. Thomas James Nathaniel LumleyThe primary purpose of this study is to investigate the proc...
This study aims at comparing five rating behaviors of 8 raters; four novice raters and four experien...
The Common European Framework of Reference (CEFR; Council of Europe, 2001) provides a competency mod...
There is relatively little research on how well the CEFR and similar holistic scales work when they...
Comparative judgement (CJ) is an evaluation method whereby a rank order is constructed from judges’ ...
The Common European Framework of Reference for Languages: Learning, teaching, assessment (CEFR) by t...
Background: Although teachers of English are required to assess students’ speaking proficiency in th...
© 2015, © The Author(s) 2015. Considering scoring validity as encompassing both reliable rating scal...
In this study, various proficiency classification methods are explored in order to describe the rele...
The article examines the use of two CEFR-based rating scales in assessing L1 and L2 texts in Swedish...
Abstract Background The CEFR, ever since its inception, has had profound impact on language teaching...
This study aimed to examine the effect of rater training on the differential rater function (rater e...
Alderson (2005) suggests that diagnostic tests should identify strengths and weaknesses in learners'...
It is well known from studies of inter-rater reliability that assessments of writing tests vary. In ...
The study aims to investigate the extent to which raters exhibit tendencies towards being overly sev...
© 2000 Dr. Thomas James Nathaniel LumleyThe primary purpose of this study is to investigate the proc...
This study aims at comparing five rating behaviors of 8 raters; four novice raters and four experien...
The Common European Framework of Reference (CEFR; Council of Europe, 2001) provides a competency mod...
There is relatively little research on how well the CEFR and similar holistic scales work when they...
Comparative judgement (CJ) is an evaluation method whereby a rank order is constructed from judges’ ...
The Common European Framework of Reference for Languages: Learning, teaching, assessment (CEFR) by t...
Background: Although teachers of English are required to assess students’ speaking proficiency in th...
© 2015, © The Author(s) 2015. Considering scoring validity as encompassing both reliable rating scal...
In this study, various proficiency classification methods are explored in order to describe the rele...
The article examines the use of two CEFR-based rating scales in assessing L1 and L2 texts in Swedish...
Abstract Background The CEFR, ever since its inception, has had profound impact on language teaching...
This study aimed to examine the effect of rater training on the differential rater function (rater e...
Alderson (2005) suggests that diagnostic tests should identify strengths and weaknesses in learners'...
It is well known from studies of inter-rater reliability that assessments of writing tests vary. In ...
The study aims to investigate the extent to which raters exhibit tendencies towards being overly sev...
© 2000 Dr. Thomas James Nathaniel LumleyThe primary purpose of this study is to investigate the proc...
This study aims at comparing five rating behaviors of 8 raters; four novice raters and four experien...