ABSTRACT: PURPOSE:To identify a gene signature for the prognosis of breast cancer using high-throughput analysis. METHODS:RNASeq, single nucleotide polymorphism (SNP), copy number variation (CNV) data and clinical follow-up information were downloaded from The Cancer Genome Atlas (TCGA), and randomly divided into training set or verification set. Genes related to breast cancer prognosis and differentially expressed genes (DEGs) with CNV or SNP were screened from training set, then integrated together for feature selection of identify robust biomarkers using RandomForest. Finally, a gene-related prognostic model was established and its performance was verified in TCGA test set, Gene Expression Omnibus (GEO) validation set and breast cancer subtypes. RESULTS:A total of 2287 prognosis-related genes, 131 genes with amplified copy numbers, 724 gens with copy number deletions, and 280 genes with significant mutations screened from Genomic Variants were closely correlated with the development of breast cancer. A total of 120 candidate genes were obtained by integrating genes from Genomic Variants and those related to prognosis, then 6 characteristic genes (CD24, PRRG1, IQSEC3, MRGPRX, RCC2, and CASP8) were top-ranked by RandomForest for feature selection, noticeably, several of these have been previously reported to be associated with the progression of breast cancer. Cox regression analysis was performed to establish a 6-gene signature, which can stratify the risk of samples from training set, test set and external validation set, moreover, the five-year survival AUC of the model in the training set and validation set was both higher than 0.65. Thus, the 6-gene signature developed in the current study could serve as an independent prognostic factor for breast cancer patients. CONCLUSION:This study constructed a 6-gene signature as a novel prognostic marker for predicting the survival of breast cancer patients, providing new diagnostic/prognostic biomarkers and therapeutic targets for breast cancer patients.