Dataset Information

Batch-normalization of cerebellar and medulloblastoma gene expression datasets utilizing empirically defined negative control genes

ABSTRACT: Medulloblastoma (MB) is a brain cancer predominantly arising in children. Roughly 70% of patients are cured today, but survivors often suffer from severe sequelae. MB has been extensively studied by molecular profiling, but often in small and scattered cohorts. To improve cure rates and reduce treatment side effects, accurate integration of such data to increase analytical power will be important, if not essential. We have integrated 23 transcription datasets, spanning 1350 MB and 291 normal brain samples. To remove batch effects, we combined the Removal of Unwanted Variation (RUV) method with a novel pipeline for determining empirical negative control genes and a panel of metrics to evaluate normalization performance. The documented approach enabled the removal of a majority of batch effects, producing a large-scale, integrative dataset of MB and cerebellar expression data. The proposed strategy will be broadly applicable for accurate integration of data and incorporation of normal reference samples for studies of various diseases. We hope that the integrated dataset will improve current research in the field of MB by allowing more large-scale gene expression analyses. For all selected samples, raw CEL files were downloaded from GEO or AE. Subsequently, all raw CEL files from the same platform were processed together using the R/Bioconductor package oligo in conjunction with the RMA algorithm. The Human Gene 1.0 ST and Human Gene 1.1 ST arrays were analysed at the core level, while the Human Exon 1.0 ST arrays were processed at the extended level. Subsequently, we mapped the identifiers of the HG-U133 Plus 2 and Human Exon 1.0 ST to Human Gene 1.0/1.1 ST identifiers using `Best Match' information from Affymetrix (https://www.affymetrix.com/support/technical/byproduct.affx?product=hugene-1_0-st-v1). In addition, to increase the overlap between the Human Exon 1.0 ST and Human Gene 1.0/1.1 ST data we also inspected and added probe mappings from the `Good Match' and `Complex Match' files, including probes for the genes MYCN, PTCH1, NPR3, UNC5D, DKK2, and GABRA5. After mapping of probe identifiers within each platform, multiple rows mapping to the same identifier were collapsed using the mean value. Subsequently, all platform datasets were merged on probe identifiers, and gene symbols were assigned using the hugene11sttranscriptcluster.db package. Multiple rows mapping to the same gene or multiple columns mapping to the same patient were collapsed using the mean value. Finally, the resulting gene expression matrix was quantile normalized using the respective function in the preprocessCore package.

ORGANISM(S): Homo sapiens

PROVIDER: GSE124814 | GEO | 2019/02/06

REPOSITORIES: GEO

ACCESS DATA

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Dataset Information

Batch-normalization of cerebellar and medulloblastoma gene expression datasets utilizing empirically defined negative control genes

Dataset's files

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

p52 transgenic expression with or without LPS in mouse lungs
2017-11-22 | GSE71648 | GEO

Exon-expression profiling of CD4+ T cells derived from HTLV-1-infected individuals with or without malignancy
2015-11-01 | GSE52244 | GEO

LL-sh4/uninfected/shGFP ANOVA group
2008-06-20 | GSE11833 | GEO

Calcium and Magnesium responses in Brassica rapa
2014-05-21 | GSE44185 | GEO

Subgroup specific somatic copy number aberrations in the medulloblastoma genome [mRNA]
2012-07-27 | GSE37382 | GEO

Feasibility of unbiased RNA profiling of colorectal tumors: a proof of principle.
2016-06-15 | E-GEOD-83353 | biostudies-arrayexpress

Relative spatial homogeneity revealed by transcriptional profiling of multi-region medulloblastoma samples
2015-12-31 | GSE62803 | GEO

Sox2 signature in SHH medulloblastomas
2015-12-31 | GSE50765 | GEO

Expression data from primary medulloblastoma samples
2017-06-13 | GSE85217 | GEO