Unknown

Dataset Information

0

Before and After: Comparison of Legacy and Harmonized TCGA Genomic Data Commons' Data.


ABSTRACT: We present a systematic analysis of the effects of synchronizing a large-scale, deeply characterized, multi-omic dataset to the current human reference genome, using updated software, pipelines, and annotations. For each of 5 molecular data platforms in The Cancer Genome Atlas (TCGA)-mRNA and miRNA expression, single nucleotide variants, DNA methylation and copy number alterations-comprehensive sample, gene, and probe-level studies were performed, towards quantifying the degree of similarity between the 'legacy' GRCh37 (hg19) TCGA data and its GRCh38 (hg38) version as 'harmonized' by the Genomic Data Commons. We offer gene lists to elucidate differences that remained after controlling for confounders, and strategies to mitigate their impact on biological interpretation. Our results demonstrate that the hg19 and hg38 TCGA datasets are very highly concordant, promote informed use of either legacy or harmonized omics data, and provide a rubric that encourages similar comparisons as new data emerge and reference data evolve.

SUBMITTER: Gao GF 

PROVIDER: S-EPMC6707074 | biostudies-literature | 2019 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Before and After: Comparison of Legacy and Harmonized TCGA Genomic Data Commons' Data.

Gao Galen F GF   Parker Joel S JS   Reynolds Sheila M SM   Silva Tiago C TC   Wang Liang-Bo LB   Zhou Wanding W   Akbani Rehan R   Bailey Matthew M   Balu Saianand S   Berman Benjamin P BP   Brooks Denise D   Chen Hu H   Cherniack Andrew D AD   Demchok John A JA   Ding Li L   Felau Ina I   Gaheen Sharon S   Gerhard Daniela S DS   Heiman David I DI   Hernandez Kyle M KM   Hoadley Katherine A KA   Jayasinghe Reyka R   Kemal Anab A   Knijnenburg Theo A TA   Laird Peter W PW   Mensah Michael K A MKA   Mungall Andrew J AJ   Robertson A Gordon AG   Shen Hui H   Tarnuzzer Roy R   Wang Zhining Z   Wyczalkowski Matthew M   Yang Liming L   Zenklusen Jean C JC   Zhang Zhenyu Z   Liang Han H   Noble Michael S MS  

Cell systems 20190701 1


We present a systematic analysis of the effects of synchronizing a large-scale, deeply characterized, multi-omic dataset to the current human reference genome, using updated software, pipelines, and annotations. For each of 5 molecular data platforms in The Cancer Genome Atlas (TCGA)-mRNA and miRNA expression, single nucleotide variants, DNA methylation and copy number alterations-comprehensive sample, gene, and probe-level studies were performed, towards quantifying the degree of similarity bet  ...[more]

Similar Datasets

| S-EPMC7900240 | biostudies-literature
| S-EPMC5683428 | biostudies-literature
| S-EPMC5538035 | biostudies-literature
| S-EPMC8156611 | biostudies-literature
| S-EPMC8918142 | biostudies-literature
| S-EPMC3748065 | biostudies-literature
| S-EPMC9882125 | biostudies-literature
| S-EPMC4387197 | biostudies-literature
| S-EPMC6365517 | biostudies-literature
| S-EPMC8830729 | biostudies-literature