In this case, the evaluator thresholds can generally be adjusted to improve compliance. The similarity of category definitions is reflected in a marginal homogeneity between evaluators. Marginal homogeneity means that the frequencies (or equivalent to “base rates”) with which two evaluators use different evaluation categories are the same. There is little consensus on the most appropriate statistical methods for analyzing evaluator concordance (here we will use the generic words “evaluators” and “evaluations” to include observers, judges, diagnostic tests, etc. . .