An increasingly common high-stakes form of rater-mediated assessment in education is classroom observations of teaching quality. An important assumption underlying meaningful comparisons of scores in these and other types of rater-mediated assessments is that measurement is commensurate or invariant across raters. However, research has shown that there are often important differences in raters ’ judgments that potentially place scores from different raters on fundamentally different scales. Despite evidence of rater differences, scores from different raters are routinely treated as if they were exchangeable and are often used to make high-stakes comparative decisions. In this study, we developed a method to accommodate measurement noninvari...
In this thesis we presented methods and procedures to test and account for measurement bias in multi...
University of Minnesota Ph.D. dissertation. August 2016. Major: Educational Psychology. Advisor: Mic...
It has previously been determined that using 3 or 4 points on a categorized response scale will fail...
In a cross-cultural study, it is crucial to understand whether items and the factorial structure of ...
Generally, ratings have notoriously low inter- rater reliabilities. Because of differences in orient...
In recent years a new methodology, the alignment method (Asparouhov & Muthén, 2014), has surfaced fo...
Of the potential sources of construct irrelevant variance or unwanted variability in performance ass...
This article shows that measurement invariance (defined in terms of an invariant measurement model i...
In educational settings, researchers are likely to encounter multilevel data with cross-classified s...
It is still common today to see questionnaires with Likert Scale items concerning very different va...
Many studies have examined the quality of automated raters, but none have focused on the potential e...
In educational measurement, various methods have been proposed to infer student proficiency from the...
AbstractThis study tested measurement invariance in a quality rating causal model in tutorial-based ...
© 2019 by the National Council on Measurement in Education Researchers have explored a variety of to...
Of the potential sources of construct irrelevant variance or unwanted variability in performance ass...
In this thesis we presented methods and procedures to test and account for measurement bias in multi...
University of Minnesota Ph.D. dissertation. August 2016. Major: Educational Psychology. Advisor: Mic...
It has previously been determined that using 3 or 4 points on a categorized response scale will fail...
In a cross-cultural study, it is crucial to understand whether items and the factorial structure of ...
Generally, ratings have notoriously low inter- rater reliabilities. Because of differences in orient...
In recent years a new methodology, the alignment method (Asparouhov & Muthén, 2014), has surfaced fo...
Of the potential sources of construct irrelevant variance or unwanted variability in performance ass...
This article shows that measurement invariance (defined in terms of an invariant measurement model i...
In educational settings, researchers are likely to encounter multilevel data with cross-classified s...
It is still common today to see questionnaires with Likert Scale items concerning very different va...
Many studies have examined the quality of automated raters, but none have focused on the potential e...
In educational measurement, various methods have been proposed to infer student proficiency from the...
AbstractThis study tested measurement invariance in a quality rating causal model in tutorial-based ...
© 2019 by the National Council on Measurement in Education Researchers have explored a variety of to...
Of the potential sources of construct irrelevant variance or unwanted variability in performance ass...
In this thesis we presented methods and procedures to test and account for measurement bias in multi...
University of Minnesota Ph.D. dissertation. August 2016. Major: Educational Psychology. Advisor: Mic...
It has previously been determined that using 3 or 4 points on a categorized response scale will fail...