Project description:The present dataset ("dataset 2") is a subset of a large metastudy on AML classfication. In total, three datasets were generated, each containing data of a different platforms: dataset 1 (Affymetrix HG-U133 A microarrays), dataset 2 (Affymetrix HG-U133 2.0 microarrays) and dataset 3 (RNA-seq). Dataset 2 was generated using the following strategy: All data sets published in the National Center for Biotechnology Information Gene Expression Omnibus (GEO) on 20 September 2017 were reviewed for inclusion. Basic criteria for inclusion were the cell type under study (human peripheral blood mononuclear cells (PMBCs) and/or bone marrow samples) as well as the species (Homo sapiens). Furthermore, GEO SuperSeries were excluded to avoid duplicated samples. We filtered the datasets for data generated with the Affymetrix Human Genome U133 Plus 2.0 Array (GLP570) and excluded studies with small sample sizes (< 50 samples). We then applied a disease-specific search, in which we filtered for acute myeloid leukemia, other leukemia and healthy or non-leukemia-related samples. The results of this search strategy were then internally reviewed and data were excluded based on the following criteria: (i) exclusion of duplicated samples, (ii) exclusion of studies that sorted single cell types (e.g. T cells or B cells) prior to gene expression profiling, (iii) exclusion of studies with inaccessible data. Other than that, no studies were excluded from our analysis. In total, the datasets contained samples from the following GSE Series: GSE12417, GSE25571, GSE37642, GSE6269, GSE67684, GSE10358, GSE10792, GSE11083, GSE12187, GSE12662, GSE13159, GSE13351, GSE13501, GSE13576, GSE14062, GSE14468, GSE14615, GSE15061, GSE15434, GSE16015, GSE16214, GSE17855, GSE18323, GSE18497, GSE19314, GSE21029, GSE21261, GSE21545, GSE22707, GSE22762, GSE22845, GSE26713, GSE27383, GSE27562, GSE28460, GSE34205, GSE35784, GSE42038, GSE42057, GSE43777, GSE46480, GSE47051, GSE49695, GSE50772, GSE60926, GSE61804, GSE62156, GSE63270, GSE66002, GSE6751, GSE67596, GSE68720, GSE68735, GSE68790, GSE68833, GSE6891, GSE70536, GSE7440, GSE76705, GSE7757, GSE78132, GSE79545, GSE87072, GSE9960. All CEL-files were downloaded from GEO and imported into R. Robist Multichip Average (RMA) expression measures were calculated using the R package affy.
2019-12-18 | GSE122511 | GEO