Dataset Information

Measuring the effect of inter-study variability on estimating prediction error.

ABSTRACT:

Background

The biomarker discovery field is replete with molecular signatures that have not translated into the clinic despite ostensibly promising performance in predicting disease phenotypes. One widely cited reason is lack of classification consistency, largely due to failure to maintain performance from study to study. This failure is widely attributed to variability in data collected for the same phenotype among disparate studies, due to technical factors unrelated to phenotypes (e.g., laboratory settings resulting in "batch-effects") and non-phenotype-associated biological variation in the underlying populations. These sources of variability persist in new data collection technologies.

Methods

Here we quantify the impact of these combined "study-effects" on a disease signature's predictive performance by comparing two types of validation methods: ordinary randomized cross-validation (RCV), which extracts random subsets of samples for testing, and inter-study validation (ISV), which excludes an entire study for testing. Whereas RCV hardwires an assumption of training and testing on identically distributed data, this key property is lost in ISV, yielding systematic decreases in performance estimates relative to RCV. Measuring the RCV-ISV difference as a function of number of studies quantifies influence of study-effects on performance.

Results

As a case study, we gathered publicly available gene expression data from 1,470 microarray samples of 6 lung phenotypes from 26 independent experimental studies and 769 RNA-seq samples of 2 lung phenotypes from 4 independent studies. We find that the RCV-ISV performance discrepancy is greater in phenotypes with few studies, and that the ISV performance converges toward RCV performance as data from additional studies are incorporated into classification.

Conclusions

We show that by examining how fast ISV performance approaches RCV as the number of studies is increased, one can estimate when "sufficient" diversity has been achieved for learning a molecular signature likely to translate without significant loss of accuracy to new clinical settings.

SUBMITTER: Ma S

PROVIDER: S-EPMC4201588 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Measuring the effect of inter-study variability on estimating prediction error.

Ma Shuyi S Sung Jaeyun J Magis Andrew T AT Wang Yuliang Y Geman Donald D Price Nathan D ND

PloS one 20141017 10

<h4>Background</h4>The biomarker discovery field is replete with molecular signatures that have not translated into the clinic despite ostensibly promising performance in predicting disease phenotypes. One widely cited reason is lack of classification consistency, largely due to failure to maintain performance from study to study. This failure is widely attributed to variability in data collected for the same phenotype among disparate studies, due to technical factors unrelated to phenotypes (e. ...[more]

PMID: 25330348

Similar Datasets

Project description:The stop signal task has been used to quantify the human inhibitory control. The inter-subject and intra-subject variability was investigated under the inhibition of human response with a realistic environmental scenario. In present study, we used a battleground scenario where a sniper-scope picture was the background, a target picture was a go signal, and a nontarget picture was a stop signal. The task instructions were to respond on the target image and inhibit the response if a nontarget image appeared. This scenario produced a threatening situation and endorsed the evaluation of how subject's response inhibition manifests in a real situation. In this study, 32 channels of electroencephalography (EEG) signals were collected from 20 participants during successful stop (response inhibition) and failed stop (response) trials. These EEG signals were used to predict two possible outcomes: successful stop or failed stop. The inter-subject variability (between-subjects) and intra-subject variability (within-subjects) affect the performance of participants in the classification system. The EEG signals of successful stop versus failed stop trials were classified using quadratic discriminant analysis (QDA) and linear discriminant analysis (LDA) (i.e., parametric) and K-nearest neighbor classifier (KNNC) and Parzen density-based (PARZEN) (i.e., nonparametric) under inter- and intra-subject variability. The EEG activities were found to increase during response inhibition in the frontal cortex (F3 and F4), presupplementary motor area (C3 and C4), parietal lobe (P3 and P4), and occipital (O1 and O2) lobe. Therefore, power spectral density (PSD) of EEG signals (1-50Hz) in F3, F4, C3, C4, P3, P4, O1, and O2 electrodes were measured in successful stop and failed stop trials. The PSD of the EEG signals was used as the feature input for the classifiers. Our proposed method shows an intra-subject classification accuracy of 97.61% for subject 15 with QDA classifier in C3 (left motor cortex) and an overall inter-subject classification accuracy of 71.66% ± 9.81% with the KNNC classifier in F3 (left frontal lobe). These results display how inter-subject and intra-subject variability affects the performance of the classification system. These findings can be used effectively to improve the psychopathology of attention deficit hyperactivity disorder (ADHD), obsessive-compulsive disorder (OCD), schizophrenia, and suicidality.

Dataset Information

Measuring the effect of inter-study variability on estimating prediction error.

Background

Methods

Results

Conclusions

Publications

Measuring the effect of inter-study variability on estimating prediction error.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets