Project description:BackgroundThe well-known Genome-Wide Association Studies (GWAS) had led to many scientific discoveries using SNP data. Even so, they were not able to explain the full heritability of complex diseases. Now, other structural variants like copy number variants or DNA inversions, either germ-line or in mosaicism events, are being studies. We present the R package affy2sv to pre-process Affymetrix CytoScan HD/750k array (also for Genome-Wide SNP 5.0/6.0 and Axiom) in structural variant studies.ResultsWe illustrate the capabilities of affy2sv using two different complete pipelines on real data. The first one performing a GWAS and a mosaic alterations detection study, and the other detecting CNVs and performing an inversion calling.ConclusionBoth examples presented in the article show up how affy2sv can be used as part of more complex pipelines aimed to analyze Affymetrix SNP arrays data in genetic association studies, where different types of structural variants are considered.
Project description:Molecular profiling of primary renal diffuse large B cell lymphoma unravels a proclivity for immune-privileged organ-tropism. Here, we used Affymetrix OncoScan CNV arrays to characterise somatic copy number variations in 29 prDLBCL cases.
Project description:BACKGROUND: Copy number variation (CNV) is essential to understand the pathology of many complex diseases at the DNA level. Affymetrix SNP arrays, which are widely used for CNV studies, significantly depend on accurate copy number (CN) estimation. Nevertheless, CN estimation may be biased by several factors, including cross-hybridization and training sample batch, as well as genomic waves of intensities induced by sequence-dependent hybridization rate and amplification efficiency. Since many available algorithms only address one or two of the three factors, a high false discovery rate (FDR) often results when identifying CNV. Therefore, we have developed a new CNV detection pipeline which is based on hybridization and amplification rate correction (CNVhac). METHODS: CNVhac first estimates the allelic concentrations (ACs) of target sequences by using the sample independent parameters trained through physicochemical hybridization law. Then the raw CN is estimated by taking the ratio of AC to the corresponding average AC from a reference sample set for one specific site. Finally, a hidden Markov model (HMM) segmentation process is implemented to detect CNV regions. RESULTS: Based on public HapMap data, the results show that CNVhac effectively smoothes the genomic waves and facilitates more accurate raw CN estimates compared to other methods. Moreover, CNVhac alleviates, to a certain extent, the sample dependence of inference and makes CNV calling with appreciable low FDRs. CONCLUSION: CNVhac is an effective approach to address the common difficulties in SNP array analysis, and the working principles of CNVhac can be easily extended to other platforms.
Project description:Desmoid tumors are bland fibroblastic tumors with little histologic variation in different regions of the tumor. While desmoid tumors do not metastasize, they have a high rate of local recurrence after complete resection and no reliable predictors of clinical behavior exist. The presence of molecular intra- and inter-tumor heterogeneity has been well established in other, higher grade, sarcomas but little is known about molecular variability within histologically bland lesions. In this study, we sought to examine the extent of intra- and inter-tumoral clonal heterogeneity of desmoid tumors, which may contribute to their pathogenesis and possible relapse. We performed analysis of DNA methylation, DNA copy number alterations, point mutations and gene expression on 24 specimens from different areas from primary and recurrent desmoid tumors from 3 patients (7-9 specimens per patient). The studies showed a remarkable heterogeneity of DNA methylation, DNA copy number alterations, point mutations or gene expression in different regions in primary or recurrent tumors in each patient. We discovered the evidence for subclonal alterations in different areas of individual tumors. Among the four types of data, the transcriptomic profiles showed the highest degree of variability within tumors and between different tumors from the same patient. Gene expression signatures associated with favorable and unfavorable outcome were detected in different regions within the same tumor. This study shows an unexpected degree of intra- and inter-tumor heterogeneity in desmoid tumors. Our analysis indicates that even in this histologically monotonous lesion, molecular analysis of a single tumor biopsy may underestimate the magnitude of molecular alterations. We demonstrate that molecular intra- and inter-tumor heterogeneity is an important consideration in drug development and validation of prognostic and predictive biomarkers for these tumors.
Project description:The reliability of differential gene expression analysis on formalin-fixed, paraffin-embedded (FFPE) expression profiles generated using Affymetrix arrays is questionable, due to the high range of percent-present values reported in studies which profiled FFPE samples using this technology. Moreover, the validity of gene-modules derived from external datasets in FFPE microarray expression profiles is unknown. By generating matched gene expression profiles using RNAs derived from fresh-frozen (FF) and FFPE preserved breast tumors with Affymetrix arrays and FF/FFPE RNA specific amplification-and-labeling kits, the reliability of differential expression analysis and the validity of gene modules derived from external datasets were investigated. Specifically, the reliability of differential expression analysis was investigated by developing de-novo ER/HER2 pathway gene-modules from the matched datasets and validating them on external FF/FFPE gene expression datasets using ROC analysis. Spearman's rank correlation coefficient of module scores between matched FFPE/frozen datasets was used to measure the reliability of gene-modules derived from external datasets in FFPE expression profiles. Independent of the array/amplification-kit/sample preservation method used, de-novo ER/HER2 gene-modules derived from all matched datasets showed similar prediction performance in the independent validation (AUC range in FFPE dataset; ER: 0.93-0.95, HER2: 0.85-0.91), except for the de-novo ER/HER2 gene-module derived from the FFPE dataset using the 3'IVT kit (AUC range in FFPE dataset; ER: 0.79-0.81, HER2: 0.78). Among the external gene modules considered, roughly ~50% gene modules showed high concordance between expression profiles derived from matching FF and FFPE RNA. The remaining discordant gene modules between FF and FFPE expression profiles showed high concordance within matching FF datasets and within matching FFPE datasets independently, implying that microarrays still require improved amplification-and-sample-preparation protocols for deriving 100% concordant expression profiles from matching FF and FFPE RNA.
Project description:Genome-wide copy number analysis using SNP-arrays (OncoScans) in multinodular goitres from individuals with the c.1552G>A;p.E518K mutation in DGCR8 show allelic imbalance at Chr22 in all samples. Likewise this event is confirmed in papillary thyroid tumors harboring the same alteration somatically. The only alteration common in all MNG and FvPTC samples was the allelic imbalance at the Chr22 in line with all samples showing an homozygous genotype at the DGCR8 locus
Project description:Epstein-Barr virus (EBV) associated diffuse large B-cell lymphoma (DLBCL) represents a rare aggressive B-cell lymphoma subtype characterized by an adverse clinical outcome. EBV infection of lymphoma cells has been associated with different lymphoma subtypes while the precise role of EBV in lymphomagenesis and specific molecular characteristics of these lymphomas remain elusive. To further unravel the biology of EBV associated DLBCL, we present a comprehensive molecular analysis of overall 60 primary EBV positive (EBV+) DLBCLs using targeted sequencing of cancer candidate genes (CCGs) and genome-wide determination of recurrent somatic copy number alterations (SCNAs) in 46 cases, respectively. Applying the LymphGen classifier 2.0, we found that less than 20% of primary EBV + DLBCLs correspond to one of the established molecular DLBCL subtypes underscoring the unique biology of this entity. We have identified recurrent mutations activating the oncogenic JAK-STAT and NOTCH pathways as well as frequent amplifications of 9p24.1 contributing to immune escape by PD-L1 overexpression. Our findings enable further functional preclinical and clinical studies exploring the therapeutic potential of targeting these aberrations in patients with EBV + DLBCL to improve outcome.
Project description:BackgroundMicroarray measurements are susceptible to a variety of experimental artifacts, some of which give rise to systematic biases that are spatially dependent in a unique way on each chip. It is likely that such artifacts affect many SNP arrays, but the normalization methods used in currently available genotyping algorithms make no attempt at spatial bias correction. Here, we propose an effective single-chip spatial bias removal procedure for Affymetrix 6.0 SNP arrays or platforms with similar design features. This procedure deals with both extreme and subtle biases and is intended to be applied before standard genotype calling algorithms.ResultsApplication of the spatial bias adjustments on HapMap samples resulted in higher genotype call rates with equal or even better accuracy for thousands of SNPs. Consequently the normalization procedure is expected to lead to more meaningful biological inferences and could be valuable for genome-wide SNP analysis.ConclusionsSpatial normalization can potentially rescue thousands of SNPs in a genetic study at the small cost of computational time. The approach is implemented in R and available from the authors upon request.
Project description:Blastic plasmacytoid dendritic cell neoplasm (BPDCN) is an aggressive malignancy assumed to originate from plasmacytoid dendritic cells (pDCs), which affects the skin and bone marrow and sequentially other organ systems. Here, we used Affymetrix OncoScan CNV arrays to characterize somatic copy number variations in 45 BPDCN cases.
Project description:Single nucleotide polymorphism (SNP) genotyping arrays remain an attractive platform for assaying copy number variants (CNVs) in large population-wide cohorts. However current tools for calling CNVs are still prone to extensive false positive calls when applied to biobank scale arrays. Moreover, there is a lack of methods exploiting cohorts with trios available (e.g. nuclear family) to assist in quality control and downstream analyses following the calling. We developed SeeCiTe (Seeing Cnvs in Trios), a novel CNV quality control tool that post-processes output from current CNV calling tools exploiting child-parent trio data to classify calls in quality categories and provide a set of visualizations for each putative CNV call in the offspring. We apply it to the Norwegian Mother, Father, and Child Cohort Study (MoBa) and show that SeeCiTe improves the specificity and sensitivity compared to the common empiric filtering strategies. To our knowledge it is the first tool that utilizes probe-level CNV data in trios (and singletons) to systematically highlight potential artefacts and visualize signal intensities in a streamlined fashion suitable for biobank scale studies. The software is implemented in R with the source code freely available at https://github.com/aksenia/SeeCiTe. Supplementary data are available at Bioinformatics online.