Dataset Information

Comparison of different rating scales for the use in Delphi studies: different scales lead to different consensus and show different test-retest reliability.

ABSTRACT:

Background

Consensus-orientated Delphi studies are increasingly used in various areas of medical research using a variety of different rating scales and criteria for reaching consensus. We explored the influence of using three different rating scales and different consensus criteria on the results for reaching consensus and assessed the test-retest reliability of these scales within a study aimed at identification of global treatment goals for total knee arthroplasty (TKA).

Methods

We conducted a two-stage study consisting of two surveys and consecutively included patients scheduled for TKA from five German hospitals. Patients were asked to rate 19 potential treatment goals on different rating scales (three-point, five-point, nine-point). Surveys were conducted within a 2 week period prior to TKA, order of questions (scales and treatment goals) was randomized.

Results

Eighty patients (mean age 68?±?10?years; 70% females) completed both surveys. Different rating scales (three-point, five-point and nine-point rating scale) lead to different consensus despite moderate to high correlation between rating scales (r?=?0.65 to 0.74). Final consensus was highly influenced by the choice of rating scale with 14 (three-point), 6 (five-point), 15 (nine-point) out of 19 treatment goals reaching the pre-defined 75% consensus threshold. The number of goals reaching consensus also highly varied between rating scales for other consensus thresholds. Overall, concordance differed between the three-point (percent agreement [p]?=?88.5%, weighted kappa [k]?=?0.63), five-point (p?=?75.3%, k?=?0.47) and nine-point scale (p?=?67.8%, k?=?0.78).

Conclusion

This study provides evidence that consensus depends on the rating scale and consensus threshold within one population. The test-retest reliability of the three rating scales investigated differs substantially between individual treatment goals. This variation in reliability can become a potential source of bias in consensus studies. In our setting aimed at capturing patients' treatment goals for TKA, the three-point scale proves to be the most reasonable choice, as its translation into the clinical context is the most straightforward among the scales. Researchers conducting Delphi studies should be aware that final consensus is substantially influenced by the choice of rating scale and consensus criteria.

SUBMITTER: Lange T

PROVIDER: S-EPMC7011537 | biostudies-literature | 2020 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Comparison of different rating scales for the use in Delphi studies: different scales lead to different consensus and show different test-retest reliability.

Lange Toni T Kopkow Christian C Lützner Jörg J Günther Klaus-Peter KP Gravius Sascha S Scharf Hanns-Peter HP Stöve Johannes J Wagner Richard R Schmitt Jochen J

BMC medical research methodology 20200210 1

<h4>Background</h4>Consensus-orientated Delphi studies are increasingly used in various areas of medical research using a variety of different rating scales and criteria for reaching consensus. We explored the influence of using three different rating scales and different consensus criteria on the results for reaching consensus and assessed the test-retest reliability of these scales within a study aimed at identification of global treatment goals for total knee arthroplasty (TKA).<h4>Methods</h ...[more]

PMID: 32041541

Similar Datasets

Project description:Functional magnetic resonance imaging studies frequently use emotional face processing tasks to probe neural circuitry related to psychiatric disorders and treatments with an emphasis on regions within the salience network (e.g., amygdala). Findings across previous test-retest reliability studies of emotional face processing have shown high variability, potentially due to differences in data analytic approaches. The present study comprehensively examined the test-retest reliability of an emotional faces task utilizing multiple approaches to region of interest (ROI) analysis and by examining voxel-wise reliability across the entire brain for both neural activation and functional connectivity. Analyses included 42 healthy adult participants who completed an fMRI scan concurrent with an emotional faces task on two separate days with an average of 25.52 days between scans. Intraclass correlation coefficients (ICCs) were calculated for the 'FACES-SHAPES' and 'FACES' (compared to implicit baseline) contrasts across the following: anatomical ROIs identified from a publicly available brain atlas (i.e., Brainnetome), functional ROIs consisting of 5-mm spheres centered on peak voxels from a publicly available meta-analytic database (i.e., Neurosynth), and whole-brain, voxel-wise analysis. Whole-brain, voxel-wise analyses of functional connectivity were also conducted using both anatomical and functional seed ROIs. While group-averaged neural activation maps were consistent across time, only one anatomical ROI and two functional ROIs showed good or excellent individual-level reliability for neural activation. The anatomical ROI was the right medioventral fusiform gyrus for the FACES contrast (ICC = 0.60). The functional ROIs were the left and the right fusiform face area (FFA) for both FACES-SHAPES and FACES (Left FFA ICCs = 0.69 & 0.79; Right FFA ICCs = 0.68 & 0.66). Poor reliability (ICCs < 0.4) was identified for almost all other anatomical and functional ROIs, with some exceptions showing fair reliability (ICCs = 0.4-0.59). Whole-brain voxel-wise analysis of neural activation identified voxels with good (ICCs = 0.6-0.74) to excellent reliability (ICCs > 0.75) that were primarily located in visual cortex, with several clusters in bilateral dorsal lateral prefrontal cortex (DLPFC). Whole-brain voxel-wise analyses of functional connectivity for amygdala and fusiform gyrus identified very few voxels with good to excellent reliability using both anatomical and functional seed ROIs. Exceptions included clusters in right cerebellum and right DLPFC that showed reliable connectivity with left amygdala (ICCs > 0.6). In conclusion, results indicate that visual cortical regions demonstrate good reliability at the individual level for neural activation, but reliability is generally poor for salience regions often focused on within psychiatric research (e.g., amygdala). Given these findings, future clinical neuroimaging studies using emotional faces tasks to examine individual differences might instead focus on visual regions and their role in psychiatric disorders.

Project description:BackgroundNeurocognitive testing is an important concussion evaluation tool, but for neurocognitive tests to be useful, their psychometric properties must be well established. Test-retest reliability of computerized neurocognitive tests can influence their clinical utility. The reliability for a commonly used computerized neurocognitive test, CNS Vital Signs, is not well established. The purpose of this study was to examine test-retest reliability and reliable change indices for CNS Vital Signs in a healthy, physically active college population.HypothesisCNS Vital Signs yields acceptable test-retest reliability, with greater reliability between the second and third test administration compared with between the first and second administration.Study designCohort study.Level of evidenceLevel 3.MethodsForty healthy, active volunteers (16 men, 24 women; mean age, 21.05 ± 2.17 years) reported to a clinical laboratory for 3 sessions, 1 week apart. At each session, participants were administered CNS Vital Signs. Outcomes included standard scores for the following CNS Vital Signs domains: verbal memory, visual memory, psychomotor speed, cognitive flexibility, complex attention, processing speed, reaction time, executive functioning, and reasoning.ResultsParticipants performed significantly better on the second session and/or third session than they did on the first testing session on 6 of 9 neurocognitive domains. Pearson r test-retest correlations between sessions ranged from 0.11 to 0.87. Intraclass correlation coefficients ranged from 0.10 to 0.86.ConclusionClinicians should consider using reliable change indices to account for practice effects, identify meaningful score changes due to pathology, and inform clinical decisions.Clinical relevanceThis study highlights the importance of clinicians understanding the psychometric properties of computerized neurocognitive tests when using them in the management of sport-related concussion. If CNS Vital Signs is administered twice within a small time frame (such as 1 week), athletes should be expected to improve between the first and second administration.

Dataset Information

Comparison of different rating scales for the use in Delphi studies: different scales lead to different consensus and show different test-retest reliability.

Background

Methods

Results

Conclusion

Publications

Comparison of different rating scales for the use in Delphi studies: different scales lead to different consensus and show different test-retest reliability.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets