The problem of ranking and weighting experts' performances when quantitative judgments are being elicited for decision support is considered. A new scoring model, the Expected Relative Frequency model, is presented, based on the closeness between central values provided by the expert and known values used for calibration. Using responses from experts in five different elicitation datasets, a cross-validation technique is used to compare this new approach with the Cooke Classical Model, the Equal Weights model, and individual experts. The analysis is performed using alternative reward schemes designed to capture proficiency either in quantifying uncertainty, or in estimating true central values. Results show that although there is only a lim...