Unknown

Dataset Information

0

The Doppelganger Effect: Hidden Duplicates in Databases of Transcriptome Profiles.


ABSTRACT: Whole-genome analysis of cancer specimens is commonplace, and investigators frequently share or re-use specimens in later studies. Duplicate expression profiles in public databases will impact re-analysis if left undetected, a so-called "doppelgänger" effect. We propose a method that should be routine practice to accurately match duplicate cancer transcriptomes when nucleotide-level sequence data are unavailable, even for samples profiled by different microarray technologies or by both microarray and RNA sequencing. We demonstrate the effectiveness of the method in databases containing dozens of datasets and thousands of ovarian, breast, bladder, and colorectal cancer microarray profiles and of matching microarray and RNA sequencing expression profiles from The Cancer Genome Atlas (TCGA). We identified probable duplicates among more than 50% of studies, originating in different continents, using different technologies, published years apart, and even within the TCGA itself. Finally, we provide the doppelgangR Bioconductor package for screening transcriptome databases for duplicates. Given the potential for unrecognized duplication to falsely inflate prediction accuracy and confidence in differential expression, doppelgänger-checking should be a part of standard procedure for combining multiple genomic datasets.

SUBMITTER: Waldron L 

PROVIDER: S-EPMC5241903 | biostudies-other | 2016 Nov

REPOSITORIES: biostudies-other

altmetric image

Publications

The Doppelgänger Effect: Hidden Duplicates in Databases of Transcriptome Profiles.

Waldron Levi L   Riester Markus M   Ramos Marcel M   Parmigiani Giovanni G   Birrer Michael M  

Journal of the National Cancer Institute 20160705 11


Whole-genome analysis of cancer specimens is commonplace, and investigators frequently share or re-use specimens in later studies. Duplicate expression profiles in public databases will impact re-analysis if left undetected, a so-called "doppelgänger" effect. We propose a method that should be routine practice to accurately match duplicate cancer transcriptomes when nucleotide-level sequence data are unavailable, even for samples profiled by different microarray technologies or by both microarra  ...[more]

Similar Datasets

| S-EPMC5119741 | biostudies-literature
| S-EPMC5858588 | biostudies-literature
| S-EPMC8011841 | biostudies-literature
| S-EPMC4495376 | biostudies-literature
| S-EPMC11342880 | biostudies-literature
| S-EPMC5037975 | biostudies-literature
2018-06-30 | GSE70411 | GEO
| S-EPMC10474133 | biostudies-literature
| S-EPMC7786106 | biostudies-literature
| S-EPMC6591892 | biostudies-literature