This paper presents a method for estimating the accuracy and consistency of classifications based on test scores. The scores can be produced by any scoring method, including the formation of a weighted composite. The estimates use data from a single form. The reliability of the score is used to estimate its effective test length in terms of discrete items. The true-score distribution is estimated by fitting a four-parameter beta model. The conditional distribution of scores on an alternate form, given the true score, is estimated from a binomial distribution based on the estimated effective test length. The agreement between classifications on two alternate forms is estimated by assuming conditional independence, given the true score. An ev...
The R package classify presents a number of useful functions which can be used to estimate the class...
Simulated rating data were generated according to a uni-factor model under varying conditions of: nu...
It has previously been determined that using 3 or 4 points on a categorized response scale will fail...
Much previously published material for estimating the reliability of classification has been based o...
An important feature of recent large-scale performance assessments has been the reporting of pupil a...
For a test that consists of dichotomously scored items, several approaches have been reported in the...
<p style="margin: 0cm 0cm 8pt;">One important step for assessing the quality of a test is to examine...
A single-administration classification reliability index is described that estimates the probability...
As with all measurements, the measurement of examinee ability, in terms of scores that the examinee ...
The Standards for Educational and Psychological Testing (1985) recommended that test publishers pro...
Every time we make a classification based on a test score, we should expect some number..of misclass...
Important decisions about students are made by combining multiple measures using complex decision ru...
As demanded by the No Child Left Behind (NCLB) legislation, state-mandated testing has increased dra...
Paper Session, E8: Test Design isssues with Diagnostic Classification ModelsAs the accuracy and cons...
The primary purpose of the study was to compare maximized split-half reliability coefficients, the t...
The R package classify presents a number of useful functions which can be used to estimate the class...
Simulated rating data were generated according to a uni-factor model under varying conditions of: nu...
It has previously been determined that using 3 or 4 points on a categorized response scale will fail...
Much previously published material for estimating the reliability of classification has been based o...
An important feature of recent large-scale performance assessments has been the reporting of pupil a...
For a test that consists of dichotomously scored items, several approaches have been reported in the...
<p style="margin: 0cm 0cm 8pt;">One important step for assessing the quality of a test is to examine...
A single-administration classification reliability index is described that estimates the probability...
As with all measurements, the measurement of examinee ability, in terms of scores that the examinee ...
The Standards for Educational and Psychological Testing (1985) recommended that test publishers pro...
Every time we make a classification based on a test score, we should expect some number..of misclass...
Important decisions about students are made by combining multiple measures using complex decision ru...
As demanded by the No Child Left Behind (NCLB) legislation, state-mandated testing has increased dra...
Paper Session, E8: Test Design isssues with Diagnostic Classification ModelsAs the accuracy and cons...
The primary purpose of the study was to compare maximized split-half reliability coefficients, the t...
The R package classify presents a number of useful functions which can be used to estimate the class...
Simulated rating data were generated according to a uni-factor model under varying conditions of: nu...
It has previously been determined that using 3 or 4 points on a categorized response scale will fail...