ABSTRACT: Context: In many cancers, specific subpopulations of cells appear to be uniquely capable of initiating and maintaining tumors. The strongest support for this cancer stem cell model comes from transplantation assays in immune-deficient mice indicating that human acute myeloid leukemia (AML) is organized as a cellular hierarchy driven by self-renewing leukemia stem cells (LSC). This model has significant implications for the development of novel therapies, but its clinical significance remains unclear. Objective: To measure associations between a leukemic stem cell expression signature and clinical outcomes in AML. Design, Setting, and Patients: We defined a gene expression signature of LSC-enriched subpopulations from primary AML patient samples and xenografts, based on a functional definition in transplantation assays. Using previously published gene expression data of bulk AML from four independent cohorts totaling 1047 patients, we performed a retrospective cohort study, defining an LSC score and evaluating it for associations with known predictors of risk including cytogenetic subtype and molecular mutations, and as an independent prognostic factor. Main Outcome Measures: Reproducible associations between a leukemic stem cell signature and overall, event-free, and relapse-free survival. Results: The LSC score was similar across most AML subtypes, but was lower in promyelocytic leukemia, and prognostically favorable cases harboring NPM1 or CEBPA mutations. Strikingly, high scores associated with inferior overall (OS), event-free (EFS), and relapse-free survival (RFS) in these independent cohorts, whether considering patients with a normal karyotype [hazard ratio (HR) range for OS 1.13-1.18, p<0.012 in all cases], or those with cytogenetic anomalies (HR range for OS 1.07-1.15, p<0.01 in all cases). In multivariate analysis, the LSC score was associated with poor outcomes independently of age, FLT3 or NPM1 mutations, and cytogenetic risk group, and added to their prognostic value. Conclusions: High expression of a leukemic stem cell gene expression signature is independently associated with adverse outcomes in AML Cellular fractionation and expression profiling of normal and leukemic subsets: Human samples were obtained at the Stanford University Medical Center according to an approved protocol of the Institutional Review Board after informed consent. Normal human bone marrow mononuclear cells were purchased from AllCells Inc. (Emeryville, CA) and human cord blood was obtained from Stanford University. For AML specimens, peripheral blood and/or bone marrow was obtained, and gene expression microarray data were generated using Affymetrix U133 Plus 2.0 microarrays from the following populations purified by fluorescence-activated cell sorting: AML LSC (Lin-CD34+CD38-CD90-, n=7), AML LPC (Lin-CD34+CD38+, n=7), AML Blasts (Lin-CD34-), normal hematopoietic stem cells (HSC, Lin-CD34+CD38-CD90+CD45RA-; bone marrow and cord blood, n=7), multipotent progenitors (Lin-CD34+CD38-CD90-CD45RA-; bone marrow and cord blood, n=7), common myeloid progenitors (Lin-CD34+CD38+CD123+CD45RA-; bone marrow, n=4), granulocyte-monocyte progenitors (Lin-CD34+CD38+CD123+CD45RA+; bone marrow, n=4), and megakaryocyte-erthythrocyte progenitors (Lin-CD34+CD38+CD123-CD45RA-; bone marrow, n=4). Raw data were deposited at the National Center for Biotechnology Information Gene Expression Omnibus (GEO, accession GSE24006). Detailed methods for purification of cellular subsets and clinical features of the corresponding AML patients have been reported previously. Microarray analysis and definition of LSC signature: Fourteen paired LSC and LPC samples from 7 patients described above were combined with 16 paired samples (8 LSC and 8 LPC) from an independent study to produce one dataset for analysis. Individual genes differentially expressed between paired LSC and LPC were identified using Significance Analysis of Microarrays, employing a paired metric (false discovery rate<10%). The ‘LSC signature’ in a given dataset was defined as the first principal component of these genes (the linear weighted sum of gene expression values that summarizes the maximum possible proportion of their total variance) across samples from that dataset. The LSC signature was evaluated across all purified subpopulations described above. To identify biological themes distinguishing LSC from LPC, all genes on microarrays were ranked by their geometric mean difference in expression between paired LSC/LPC samples, and evaluated using Gene Set Enrichment Analysis. Raw microarray data were obtained as Affymetrix CEL files for four publicly available bulk AML gene expression studies from NCBI GEO (GSE12417, n=163 normal-karyotype AML only, with OS outcomes; GSE10358, n=184, OS and EFS; GSE14468, n=527, OS, EFS and RFS) and the National Cancer Institute caArray database (accession willm-00119, n=170 non-FAB M3, OS only). Matrices of re-analyzed data linked below as supplementary files. Ingenuity Pathways Analysis was used to identify interaction networks of genes.