Unknown

Dataset Information

0

Summary measures of agreement and association between many raters' ordinal classifications.


ABSTRACT: PURPOSE:Interpretation of screening tests such as mammograms usually require a radiologist's subjective visual assessment of images, often resulting in substantial discrepancies between radiologists' classifications of subjects' test results. In clinical screening studies to assess the strength of agreement between experts, multiple raters are often recruited to assess subjects' test results using an ordinal classification scale. However, using traditional measures of agreement in some studies is challenging because of the presence of many raters, the use of an ordinal classification scale, and unbalanced data. METHODS:We assess and compare the performances of existing measures of agreement and association as well as a newly developed model-based measure of agreement to three large-scale clinical screening studies involving many raters' ordinal classifications. We also conduct a simulation study to demonstrate the key properties of the summary measures. RESULTS:The assessment of agreement and association varied according to the choice of summary measure. Some measures were influenced by the underlying prevalence of disease and raters' marginal distributions and/or were limited in use to balanced data sets where every rater classifies every subject. Our simulation study indicated that popular measures of agreement and association are prone to underlying disease prevalence. CONCLUSIONS:Model-based measures provide a flexible approach for calculating agreement and association and are robust to missing and unbalanced data as well as the underlying disease prevalence.

SUBMITTER: Mitani AA 

PROVIDER: S-EPMC5687310 | biostudies-literature | 2017 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Summary measures of agreement and association between many raters' ordinal classifications.

Mitani Aya A AA   Freer Phoebe E PE   Nelson Kerrie P KP  

Annals of epidemiology 20170922 10


<h4>Purpose</h4>Interpretation of screening tests such as mammograms usually require a radiologist's subjective visual assessment of images, often resulting in substantial discrepancies between radiologists' classifications of subjects' test results. In clinical screening studies to assess the strength of agreement between experts, multiple raters are often recruited to assess subjects' test results using an ordinal classification scale. However, using traditional measures of agreement in some s  ...[more]

Similar Datasets

| S-EPMC4560692 | biostudies-literature
| S-EPMC8048180 | biostudies-literature
| S-EPMC5540881 | biostudies-literature
| S-EPMC9631095 | biostudies-literature
| S-EPMC2488396 | biostudies-literature
| S-EPMC7138463 | biostudies-literature
| S-EPMC3118948 | biostudies-literature
| S-EPMC7649720 | biostudies-literature
| S-EPMC6249138 | biostudies-literature
| S-EPMC7233794 | biostudies-literature