Microarray gene expression analysis: Batch effect removal improves the cross-platform consistency
Ontology highlight
ABSTRACT: Microarray is a powerful technique that has been used extensively for genome-wide gene expression analysis. Several different microarray technologies are available, but lack of standardization makes it challenging to compare and integrate data from different platforms. Furthermore, batch related biases within datasets are common, but are often not tackled prior to the data analysis, potentially affecting the end results. In the current study, a set of 234 breast cancer samples were analyzed on two different microarray platforms. The aim was to compare and evaluate the reproducibility and accuracy of gene expression measurements obtained from our in-house 29K array platform with data from Agilent SurePrint G3 microarray platform. The 29K dataset contained known batch-effects associated with the fabrication procedure. We here demonstrate how the use of ComBat batch adjustments method can unmask true biological signals by successfully overcoming systematic technical variations caused by differences between fabrication batches and microarray platforms. Paired correlation analysis revealed a high level of consistency between data obtained from the 29K gene expression platform and Agilent SurePrint G3 platform, which could be further improved by ComBat batch adjustment. Particularly high-variance genes were found to be highly reproducibly expressed across platforms. Furthermore, high concordance rates were observed both for prediction of estrogen receptor status and intrinsic molecular breast cancer subtype classification, two clinical important parameters. In conclusion, the current study emphasizes the importance of utilizing proper batch adjustment methods to reduce systematically technical bias when comparing and integrating data from different fabrication batches and microarray platforms.
ORGANISM(S): Homo sapiens
PROVIDER: GSE54275 | GEO | 2015/06/01
SECONDARY ACCESSION(S): PRJNA236097
REPOSITORIES: GEO
ACCESS DATA