Project description:We present a meta-dataset comprising of a total of 1566 samples including both primary tumors and tumor-free colorectal tissues from 15 independent GEO datasets. To minimise inter-platform variation, only datasets generated from the GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array) were processed to develop the meta-dataset. Using multiple open source R packages implemented in our previously developed bioinformatics pipeline, each dataset has been preprocessed with RMA normalisation, merged, and batch effect-corrected via Combat method. With increased sample size, the present meta-dataset serves an excellent 'discovery cohort' for discovering differentially expressed in diseased phenotype.