Unknown

Dataset Information

0

Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies.


ABSTRACT: Concern about the reproducibility and reliability of biomedical research has been rising. An understudied issue is the prevalence of sample mislabeling, one impact of which would be invalid comparisons. We studied this issue in a corpus of human transcriptomics studies by comparing the provided annotations of sex to the expression levels of sex-specific genes. We identified apparent mislabeled samples in 46% of the datasets studied, yielding a 99% confidence lower-bound estimate for all studies of 33%. In a separate analysis of a set of datasets concerning a single cohort of subjects, 2/4 had mislabeled samples, indicating laboratory mix-ups rather than data recording errors. While the number of mixed-up samples per study was generally small, because our method can only identify a subset of potential mix-ups, our estimate is conservative for the breadth of the problem. Our findings emphasize the need for more stringent sample tracking, and that re-users of published data must be alert to the possibility of annotation and labelling errors.

SUBMITTER: Toker L 

PROVIDER: S-EPMC5034794 | biostudies-literature | 2016

REPOSITORIES: biostudies-literature

altmetric image

Publications

Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies.

Toker Lilah L   Feng Min M   Pavlidis Paul P  

F1000Research 20160830


Concern about the reproducibility and reliability of biomedical research has been rising. An understudied issue is the prevalence of sample mislabeling, one impact of which would be invalid comparisons. We studied this issue in a corpus of human transcriptomics studies by comparing the provided annotations of sex to the expression levels of sex-specific genes. We identified apparent mislabeled samples in 46% of the datasets studied, yielding a 99% confidence lower-bound estimate for all studies  ...[more]

Similar Datasets

| S-EPMC4581699 | biostudies-literature
| S-EPMC1489946 | biostudies-literature
| S-EPMC3642700 | biostudies-literature
| S-EPMC7064187 | biostudies-literature
| S-EPMC10641963 | biostudies-literature
| S-EPMC4858488 | biostudies-literature
| S-EPMC7819463 | biostudies-literature
| S-EPMC9017657 | biostudies-literature
| S-EPMC6298257 | biostudies-literature
| S-EPMC10353316 | biostudies-literature