Assessing rater performance without a 'gold standard' using consensus theory

Susan Weller, N. Clay Mann

Research output: Contribution to journalArticle

85 Citations (Scopus)

Abstract

This study illustrates the use of consensus theory to assess the diagnostic performances of raters and to estimate case diagnoses in the absence of a criterion or 'gold' standard. A description is provided of how consensus theory 'pools' information provided by raters, estimating rarer competencies and differentially weighting their responses. Although the model assumes that raters respond without bias (i.e., sensitivity = specificity), a Monte Carlo simulation with 1,200 data sets shows that model estimates appear to be robust even with bias. The model is illustrated on a set of elbow radiographs, and consensus-model estimates are compared with those obtained from follow-up data. Results indicate that with high rater competencies, the model retrieves accurate estimates of competency and case diagnoses even when raters' responses are biased.

Original languageEnglish (US)
Pages (from-to)71-79
Number of pages9
JournalMedical Decision Making
Volume17
Issue number1
DOIs
StatePublished - Jan 1997

Fingerprint

Information Theory
Elbow
Gold
Sensitivity and Specificity
Datasets

Keywords

  • clinical competence
  • consensus theory
  • diagnostic evaluation
  • interobserver variation
  • models-mathematical

ASJC Scopus subject areas

  • Public Health, Environmental and Occupational Health
  • Health Informatics
  • Health Information Management
  • Nursing(all)

Cite this

Assessing rater performance without a 'gold standard' using consensus theory. / Weller, Susan; Mann, N. Clay.

In: Medical Decision Making, Vol. 17, No. 1, 01.1997, p. 71-79.

Research output: Contribution to journalArticle

@article{23dc7c1739cb45f78e0abeff66242720,
title = "Assessing rater performance without a 'gold standard' using consensus theory",
abstract = "This study illustrates the use of consensus theory to assess the diagnostic performances of raters and to estimate case diagnoses in the absence of a criterion or 'gold' standard. A description is provided of how consensus theory 'pools' information provided by raters, estimating rarer competencies and differentially weighting their responses. Although the model assumes that raters respond without bias (i.e., sensitivity = specificity), a Monte Carlo simulation with 1,200 data sets shows that model estimates appear to be robust even with bias. The model is illustrated on a set of elbow radiographs, and consensus-model estimates are compared with those obtained from follow-up data. Results indicate that with high rater competencies, the model retrieves accurate estimates of competency and case diagnoses even when raters' responses are biased.",
keywords = "clinical competence, consensus theory, diagnostic evaluation, interobserver variation, models-mathematical",
author = "Susan Weller and Mann, {N. Clay}",
year = "1997",
month = "1",
doi = "10.1177/0272989X9701700108",
language = "English (US)",
volume = "17",
pages = "71--79",
journal = "Medical Decision Making",
issn = "0272-989X",
publisher = "SAGE Publications Inc.",
number = "1",

}

TY - JOUR

T1 - Assessing rater performance without a 'gold standard' using consensus theory

AU - Weller, Susan

AU - Mann, N. Clay

PY - 1997/1

Y1 - 1997/1

N2 - This study illustrates the use of consensus theory to assess the diagnostic performances of raters and to estimate case diagnoses in the absence of a criterion or 'gold' standard. A description is provided of how consensus theory 'pools' information provided by raters, estimating rarer competencies and differentially weighting their responses. Although the model assumes that raters respond without bias (i.e., sensitivity = specificity), a Monte Carlo simulation with 1,200 data sets shows that model estimates appear to be robust even with bias. The model is illustrated on a set of elbow radiographs, and consensus-model estimates are compared with those obtained from follow-up data. Results indicate that with high rater competencies, the model retrieves accurate estimates of competency and case diagnoses even when raters' responses are biased.

AB - This study illustrates the use of consensus theory to assess the diagnostic performances of raters and to estimate case diagnoses in the absence of a criterion or 'gold' standard. A description is provided of how consensus theory 'pools' information provided by raters, estimating rarer competencies and differentially weighting their responses. Although the model assumes that raters respond without bias (i.e., sensitivity = specificity), a Monte Carlo simulation with 1,200 data sets shows that model estimates appear to be robust even with bias. The model is illustrated on a set of elbow radiographs, and consensus-model estimates are compared with those obtained from follow-up data. Results indicate that with high rater competencies, the model retrieves accurate estimates of competency and case diagnoses even when raters' responses are biased.

KW - clinical competence

KW - consensus theory

KW - diagnostic evaluation

KW - interobserver variation

KW - models-mathematical

UR - http://www.scopus.com/inward/record.url?scp=0031013987&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0031013987&partnerID=8YFLogxK

U2 - 10.1177/0272989X9701700108

DO - 10.1177/0272989X9701700108

M3 - Article

VL - 17

SP - 71

EP - 79

JO - Medical Decision Making

JF - Medical Decision Making

SN - 0272-989X

IS - 1

ER -