Project description:We analysed DNA from two brain regions (cerebellum, CER and frontal cortex, FC) from 4 Parkinson's disease (PD) and 4 control brains on a custom design 8x60k Agilent aCGH targeted to PD genes. All brain DNA samples were hybridised with Agilent sex-matched reference DNA, and three CER samples were hybridised against the FC of the same brain, with a dye swap in one. Male and female reference DNA were hybridised to eachother. The samples were then re-extracted with additional protocols, and hybridisations were performed for two CER samples betwen DNA extracted from the same CER with different protocols, and for one brain between the CER and FC new extraction.
Project description:Mitochondrial DNA copy number (mtDNA-CN), a measure of the number of mitochondrial genomes per cell, is a minimally invasive proxy measure for mitochondrial function and has been associated with several aging-related diseases. Although quantitative real-time PCR (qPCR) is the current gold standard method for measuring mtDNA-CN, mtDNA-CN can also be measured from genotyping microarray probe intensities and DNA sequencing read counts. To conduct a comprehensive examination on the performance of these methods, we use known mtDNA-CN correlates (age, sex, white blood cell count, Duffy locus genotype, incident cardiovascular disease) to evaluate mtDNA-CN calculated from qPCR, two microarray platforms, as well as whole genome (WGS) and whole exome sequence (WES) data across 1,085 participants from the Atherosclerosis Risk in Communities (ARIC) study and 3,489 participants from the Multi-Ethnic Study of Atherosclerosis (MESA). We observe mtDNA-CN derived from WGS data is significantly more associated with known correlates compared to all other methods (p < 0.001). Additionally, mtDNA-CN measured from WGS is on average more significantly associated with traits by 5.6 orders of magnitude and has effect size estimates 5.8 times more extreme than the current gold standard of qPCR. We further investigated the role of DNA extraction method on mtDNA-CN estimate reproducibility and found mtDNA-CN estimated from cell lysate is significantly less variable than traditional phenol-chloroform-isoamyl alcohol (p = 5.44x10-4) and silica-based column selection (p = 2.82x10-7). In conclusion, we recommend the field moves towards more accurate methods for mtDNA-CN, as well as re-analyze trait associations as more WGS data becomes available from larger initiatives such as TOPMed.
Project description:MotivationDNA copy number variants (CNVs) are gains and losses of segments of chromosomes, and comprise an important class of genetic variation. Recently, various microarray hybridization-based techniques have been developed for high-throughput measurement of DNA copy number. In many studies, multiple technical platforms or different versions of the same platform were used to interrogate the same samples; and it became necessary to pool information across these multiple sources to derive a consensus molecular profile for each sample. An integrated analysis is expected to maximize resolution and accuracy, yet currently there is no well-formulated statistical method to address the between-platform differences in probe coverage, assay methods, sensitivity and analytical complexity.ResultsThe conventional approach is to apply one of the CNV detection ('segmentation') algorithms to search for DNA segments of altered signal intensity. The results from multiple platforms are combined after segmentation. Here we propose a new method, Multi-Platform Circular Binary Segmentation (MPCBS), which pools statistical evidence across platforms during segmentation, and does not require pre-standardization of different data sources. It involves a weighted sum of t-statistics, which arises naturally from the generalized log-likelihood ratio of a multi-platform model. We show by comparing the integrated analysis of Affymetrix and Illumina SNP array data with Agilent and fosmid clone end-sequencing results on eight HapMap samples that MPCBS achieves improved spatial resolution, detection power and provides a natural consensus across platforms. We also apply the new method to analyze multi-platform data for tumor samples.AvailabilityThe R package for MPCBS is registered on R-Forge (http://r-forge.r-project.org/) under project name MPCBS.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:Affymetrix SNP arrays have been widely used for single-nucleotide polymorphism (SNP) genotype calling and DNA copy number variation inference. Although numerous methods have achieved high accuracy in these fields, most studies have paid little attention to the modeling of hybridization of probes to off-target allele sequences, which can affect the accuracy greatly. In this study, we address this issue and demonstrate that hybridization with mismatch nucleotides (HWMMN) occurs in all SNP probe-sets and has a critical effect on the estimation of allelic concentrations (ACs). We study sequence binding through binding free energy and then binding affinity, and develop a probe intensity composite representation (PICR) model. The PICR model allows the estimation of ACs at a given SNP through statistical regression. Furthermore, we demonstrate with cell-line data of known true copy numbers that the PICR model can achieve reasonable accuracy in copy number estimation at a single SNP locus, by using the ratio of the estimated AC of each sample to that of the reference sample, and can reveal subtle genotype structure of SNPs at abnormal loci. We also demonstrate with HapMap data that the PICR model yields accurate SNP genotype calls consistently across samples, laboratories and even across array platforms.
Project description:Potential bias introduced during DNA isolation is inadequately explored, although it could have significant impact on downstream analysis. To investigate this in human brain, we isolated DNA from cerebellum and frontal cortex using spin columns under different conditions, and salting-out. We first analysed DNA using array CGH, which revealed a striking wave pattern suggesting primarily GC-rich cerebellar losses, even against matched frontal cortex DNA, with a similar pattern on a SNP array. The aCGH changes varied with the isolation protocol. Droplet digital PCR of two genes also showed protocol-dependent losses. Whole genome sequencing showed GC-dependent variation in coverage with spin column isolation from cerebellum. We also extracted and sequenced DNA from substantia nigra using salting-out and phenol / chloroform. The mtDNA copy number, assessed by reads mapping to the mitochondrial genome, was higher in substantia nigra when using phenol / chloroform. We thus provide evidence for significant method-dependent bias in DNA isolation from human brain, as reported in rat tissues. This may contribute to array "waves", and could affect copy number determination, particularly if mosaicism is being sought, and sequencing coverage. Variations in isolation protocol may also affect apparent mtDNA abundance.
Project description:Protein copy numbers constrain systems-level properties of regulatory networks, but proportional proteomic data remain scarce compared to RNA-seq. We related mRNA to protein statistically using best-available data from quantitative proteomics-transcriptomics for 4366 genes in 369 cell lines. The approach starts with a protein's median copy number and hierarchically appends mRNA-protein and mRNA-mRNA dependencies to define an optimal gene-specific model linking mRNAs to protein. For dozens of cell lines and primary samples, these protein inferences from mRNA outmatch stringent null models, a count-based protein-abundance repository, empirical protein-to-mRNA ratios, and a proteogenomic DREAM challenge winner. The optimal mRNA-to-protein relationships capture biological processes along with hundreds of known protein-protein complexes, suggesting mechanistic relationships. We use the method to identify a viral-receptor abundance threshold for coxsackievirus B3 susceptibility from 1489 systems-biology infection models parameterized by protein inference. When applied to 796 RNA-seq profiles of breast cancer, inferred copy-number estimates collectively reclassify 26-29% of luminal tumors. By adopting a gene-centered perspective of mRNA-protein covariation across different biological contexts, we achieve accuracies comparable to the technical reproducibility of contemporary proteomics.
Project description:Whole-genome single-cell DNA sequencing (scDNA-seq) enables characterization of copy-number profiles at the cellular level. We propose SCOPE, a normalization and copy-number estimation method for the noisy scDNA-seq data. SCOPE's main features include the following: (1) a Poisson latent factor model for normalization, which borrows information across cells and regions to estimate bias, using in silico identified negative control cells; (2) an expectation-maximization algorithm embedded in the normalization step, which accounts for the aberrant copy-number changes and allows direct ploidy estimation without the need for post hoc adjustment; and (3) a cross-sample segmentation procedure to identify breakpoints that are shared across cells with the same genetic background. We evaluate SCOPE on a diverse set of scDNA-seq data in cancer genomics and show that SCOPE offers accurate copy-number estimates and successfully reconstructs subclonal structure. A record of this paper's transparent peer review process is included in the Supplemental Information.
Project description:MotivationAdvances in whole-genome single-cell DNA sequencing (scDNA-seq) have led to the development of numerous methods for detecting copy number aberrations (CNAs), a key driver of genetic heterogeneity in cancer. While most of these methods are limited to the inference of total copy number, some recent approaches now infer allele-specific CNAs using innovative techniques for estimating allele-frequencies in low coverage scDNA-seq data. However, these existing allele-specific methods are limited in their segmentation strategies, a crucial step in the CNA detection pipeline.ResultsWe present SEACON (Single-cell Estimation of Allele-specific COpy Numbers), an allele-specific copy number profiler for scDNA-seq data. SEACON uses a Gaussian Mixture Model to identify latent copy number states and breakpoints between contiguous segments across cells, filters the segments for high-quality breakpoints using an ensemble technique, and adopts several strategies for tolerating noisy read-depth and allele frequency measurements. Using a wide array of both real and simulated datasets, we show that SEACON derives accurate copy numbers and surpasses existing approaches under numerous experimental conditions, and identify its strengths and weaknesses.Availability and implementationSEACON is implemented in Python and is freely available open-source from https://github.com/NabaviLab/SEACON and https://doi.org/10.5281/zenodo.12727008.
Project description:Aging is a complex process strongly determined by genetics. Previous reports have shown that the genome of neuronal cells displays somatic genomic mosaicism including DNA copy number variations (CNVs). CNVs represent a significant source of genetic variation in the human genome and have been implicated in several disorders and complex traits, representing a potential mechanism that contributes to neuronal diversity and the etiology of several neurological diseases and provides new insights into the normal, complex functions of the brain. Nonetheless, the features of somatic CNV mosaicism in nondiseased elderly brains have not been investigated. In the present study, we demonstrate a highly significant increase in the number of CNVs in nondiseased elderly brains compared to the blood. In two neural tissues isolated from paired postmortem samples (same individuals), we found a significant increase in the frequency of deletions in both brain areas, namely, the frontal cortex and cerebellum. Also, deletions were found to be significantly larger when present only in the cerebellum. The sizes of the variants described here were in the 150-760 kb range, and importantly, nearly all of them were present in the Database of Genomic Variants (common variants). Nearly all evidence of genome structural variation in human brains comes from studies detecting changes in single cells which were interpreted as derived from independent, isolated mutational events. The observations based on array-CGH analysis indicate the existence of an extensive clonal mosaicism of CNVs within and between the human brains revealing a different type of variation that had not been previously characterized.