Dataset Information

Self-assessed performance improves statistical fusion of image labels.

ABSTRACT: Expert manual labeling is the gold standard for image segmentation, but this process is difficult, time-consuming, and prone to inter-individual differences. While fully automated methods have successfully targeted many anatomies, automated methods have not yet been developed for numerous essential structures (e.g., the internal structure of the spinal cord as seen on magnetic resonance imaging). Collaborative labeling is a new paradigm that offers a robust alternative that may realize both the throughput of automation and the guidance of experts. Yet, distributing manual labeling expertise across individuals and sites introduces potential human factors concerns (e.g., training, software usability) and statistical considerations (e.g., fusion of information, assessment of confidence, bias) that must be further explored. During the labeling process, it is simple to ask raters to self-assess the confidence of their labels, but this is rarely done and has not been previously quantitatively studied. Herein, the authors explore the utility of self-assessment in relation to automated assessment of rater performance in the context of statistical fusion.The authors conducted a study of 66 volumes manually labeled by 75 minimally trained human raters recruited from the university undergraduate population. Raters were given 15 min of training during which they were shown examples of correct segmentation, and the online segmentation tool was demonstrated. The volumes were labeled 2D slice-wise, and the slices were unordered. A self-assessed quality metric was produced by raters for each slice by marking a confidence bar superimposed on the slice. Volumes produced by both voting and statistical fusion algorithms were compared against a set of expert segmentations of the same volumes.Labels for 8825 distinct slices were obtained. Simple majority voting resulted in statistically poorer performance than voting weighted by self-assessed performance. Statistical fusion resulted in statistically indistinguishable performance from self-assessed weighted voting. The authors developed a new theoretical basis for using self-assessed performance in the framework of statistical fusion and demonstrated that the combined sources of information (both statistical assessment and self-assessment) yielded statistically significant improvement over the methods considered separately.The authors present the first systematic characterization of self-assessed performance in manual labeling. The authors demonstrate that self-assessment and statistical fusion yield similar, but complementary, benefits for label fusion. Finally, the authors present a new theoretical basis for combining self-assessments with statistical label fusion.

SUBMITTER: Bryan FW

PROVIDER: S-EPMC3978333 | biostudies-other | 2014 Mar

REPOSITORIES: biostudies-other

ACCESS DATA

Similar Datasets

Project description:BackgroundPhysical function constitutes a key component of outcome assessment for almost all osteoarthritis interventions. The aim was to compare physical function measured using a self-assessed performance-based test versus self-reported function using questionnaires among individuals with knee or hip osteoarthritis (OA) participating in a digital exercise and education therapy.MethodsWe analysed data from individuals aged 40 + years participating in the digital program. We extracted data on the self-assessed 30-second chair stand test (30s CST) and the function subscales of Knee injury/Hip disability and Osteoarthritis Outcome Score 12 (KOOS-12/HOOS-12) at enrolment and 3- (n = 10884) and 12-month (n = 3554) follow-ups. Participants completed Numeric Rating Scale (NRS) pain, EQ-5D-5L, and an external anchor: global rating of change scale. Correlations were assessed using the Spearman correlation coefficient, responsiveness using standardized response mean (SRM) and receiver operating characteristic (ROC) curves, and agreement using weighted percent of agreement and weighted Gwet's agreement coefficient.ResultsCorrelations were weak between the 30s CST and KOOS-12/HOOS-12 function (r < 0.35 for raw and r < 0.20 for change scores). Correlations with NRS pain and EQ-5D-5L were stronger for the KOOS-12/HOOS-12 function subscale than for 30s CST. Greater internal (SRM > 1 vs. SRM < 0.5) and lower external responsiveness were observed for the 30s CST versus the KOOS-12/HOOS-12 function, even though external responsiveness was generally inadequate for both (the area under the ROC curves < 0.7). The direction of change was similar for the two function measures for about 70% of subjects with moderate agreement between them (weighted Gwet's agreement coefficient range 0.45 to 0.50).ConclusionWeak correlations and moderate agreements between function measured using performance-based test and self-reported using KOOS-12/HOOS-12 in people with knee or hip OA suggest that they may capture different aspects of functional abilities in this population.

Dataset Information

Self-assessed performance improves statistical fusion of image labels.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets