ABSTRACT: High-grade osteosarcoma is a tumor with a complex genomic profile, occurring primarily in adolescents with a second peak at middle age. The extensive genomic alterations obscure the identification of genes driving tumorigenesis during osteosarcoma development. In order to identify such driver genes, we integrated DNA copy number profiles (Affymetrix SNP 6.0) of 32 diagnostic biopsies with 84 expression profiles (Illumina Human-6 v2.0) of high-grade osteosarcoma as compared with its putative progenitor cells, i.e. mesenchymal stem cells (n=12) or osteoblasts (n=3). In addition, we performed paired analyses between copy number and expression profiles of a subset of 29 patients for which both DNA and mRNA profiles were available. Integrative analyses were performed in Nexus Copy Number software and statistical language R. Paired analyses were performed on all probes detecting significantly differentially expressed genes in corresponding LIMMA analyses. For both non-paired and paired analyses, copy number aberration frequency was set to >35%. Non-paired and paired integrative analyses resulted in 45 and 101 genes, respectively, which were present in both analyses using different control sets. Paired analyses detected >90% of all genes found with the corresponding non-paired analyses. Remarkably, approximately twice as many genes as found in the corresponding non-paired analyses were detected. Affected genes were intersected with differentially expressed genes in osteosarcoma cell lines, resulting in 31 new osteosarcoma driver genes. Cell division related genes, such as MCM4 and LATS2, were overrepresented and genomic-instability was predictive for metastasis-free survival, suggesting that deregulation of the cell cycle is a driver of osteosarcomagenesis. This SuperSeries is composed of the following subset Series: GSE28974: Genome-wide gene expression profiling of mesenchymal stem cells GSE33153: Copy number analysis of high-grade osteosarcoma GSE33382: Genome-wide gene expression analysis of high-grade osteosarcoma For data processing, we refer to the individual series. We performed both non-paired and paired integrative analyses on SNP and gene expression data. Non-paired integrative analysis was performed by importing lists of differentially expressed genes into the Copy Number module of Nexus software version 5 (BioDiscovery, CA). Based on the length of the gene list, Nexus software performs a Fisher's exact test in order to determine whether the number of differentially expressed genes in a specific region with a significant copy number alteration is larger than expected by chance. Genes present in such regions of copy number alteration with FDR-adjusted P-values (Q-bounds in Nexus software) < 0.05 were returned from this integrative analysis. Nexus software only reports genes which are both gained and overexpressed, or both deleted and downregulated. For the paired integrative analysis, copy number data of all autosomal overlapping genes between the copy number and gene expression arrays were exported from Nexus software, and converted into a binary file containing all genes with a gain (1) and no gain (0), and a similar binary file for losses. As in the non-paired integrative analysis, we did not apply any restrictions on the size of copy number alterations. Gene expression data of each probe for each sample were normalized against average gene expression of the corresponding probes over all control samples (either expression data from 12 MSCs, or from 3 osteoblasts). This was performed by subtracting the average expression of the control samples from the expression levels of the sample of interest, since these are log-transformed expression values. For both analyses, only genes that were significantly differentially expressed between the 84 osteosarcoma samples and the specific control set were analyzed, in order to make sure that no genes returned from the integrative analysis were not significantly differentially expressed. Subse quently, genes that overlapped between the copy number binary files and that matched the fold change of expression (upregulation for genes with gains, and downregulation for genes with losses) were returned.