ABSTRACT: Background Understanding the heterogeneous genotypes and phenotypes of prostate cancer is fundamental to improving the way we treat this disease. As yet, there are no validated descriptions of prostate cancer subgroups derived from integrated genomics linked with clinical outcome. Methods In a study of 482 tumour, benign and germline samples from 259 men with primary prostate cancer, we used integrative analysis of copy number alterations (CNA) and array transcriptomics to identify genomic loci that affect expression levels of mRNA in an expression quantitative trait loci (eQTL) approach, to stratify patients into subgroups that we then associated with future clinical behavior, and compared with either CNA or transcriptomics alone. Findings We identified five separate patient subgroups with distinct genomic alterations and expression profiles based on 100 discriminating genes in our separate discovery and validation sets of 125 and 99 men. These subgroups were able to consistently predict biochemical relapse (p=0.0017 and p=0.016 respectively) and were further validated in a third cohort with long-term follow-up (p=0.027). We show the relative contributions of gene expression and copy number data on phenotype, and demonstrate the improved power gained from integrative analyses. We confirm alterations in six genes previously associated with prostate cancer (MAP3K7, MELK, RCBTB2, ELAC2, TPD52, ZBTB4) in prostate cancer, and also identify 94 genes not previously linked to prostate cancer progression that would not have been detected using either transcript or copy number data alone. We confirm a number of previously published molecular changes associated with high risk disease, including MYC amplification, and NKX3-1, RB1 and PTEN deletions, as well as over-expression of PCA3 and AMACR, and loss of MSMB in tumour tissue. A subset of the 100 genes outperforms established clinical predictors of poor prognosis (PSA, Gleason score), as well as previously published gene signatures (p=0•0001). We further show how our molecular profiles can be used for the early detection of aggressive cases in a clinical setting, and inform treatment decisions. Interpretation For the first time this study demonstrates the importance of integrated genomic analyses incorporating both benign and tumour tissue data in identifying molecular alterations leading to generation of robust gene sets that are predictive of clinical outcome in independent patient cohorts. A total of 482 samples from 289 men with prostate cancer from two cohorts were included in this study. The discovery cohort comprised 125 tumour samples from radical prostatectomy (RP) with 118 matched benign samples, and 85 matched blood samples. An additional 4 benign samples from men undergoing Holmium laser enucleation of the prostate (HoLEP) and 16 radical prostatectomy samples from men with castrate-resistant prostate cancer, with 13 matched blood samples were also included. These were assayed on several platforms, including Illumina HT12v4 gene expression arrays, Illumina OMNI2.5M genotyping arrays and Affymetrix SNP6 genotyping arrays. The validation cohort comprised 103 tumour tissue samples from men with prostate cancer, with 99 matched benign tissue samples and 103 matched blood samples. This datasheet describes samples in the DISCOVERY COHORT only, with complete, QCd Illumina HT12v4 data for 13 CRPC samples, 113 tumour samples and 73 matched benign samples. Extensive clinical metadata is available in the associated publication Ross-Adams et al. (2015, Suppl. Table 2)