Project description:<p> This study is part of the '<i>First 1,000 Days of Life and Beyond</i>' study at the Inova Translational Medicine Institute. Whole-genome sequencing data from 1,291 parent-offspring trios was used to study the properties of clustered <i>de novo</i> mutations. The maternal clusters were found to be enriched in regions with accelerated maternal mutation rate and show distinct mutational signatures. </p> <p>For additional details, please refer to: "<i>Germline de novo mutation clusters arise during oocyte aging in genomic regions with increased double-strand break incidence</i>". Jakob M. Goldmann, Vladimir B. Seplyarskiy, Wendy S.W. Wong, Thierry Vilboux, Pieter B. Neerincx, Dale L. Bodian, Benjamin D. Solomon, Joris A. Veltman, John F. Deeken, Christian Gilissen, John E. Niederhuber. <a href="https://www.ncbi.nlm.nih.gov/pubmed/29507425">Nature Genetics</a>. </p>
Project description:Comparison of whole genome exome array CGH to a commercial SNP array for detection of de novo and homozygous copy number variants in 99 autism simplex trios. Will update once manuscript is prepared.
Project description:MotivationWhole-genome and -exome sequencing on parent-offspring trios is a powerful approach to identifying disease-associated genes by detecting de novo mutations in patients. Accurate detection of de novo mutations from sequencing data is a critical step in trio-based genetic studies. Existing bioinformatic approaches usually yield high error rates due to sequencing artifacts and alignment issues, which may either miss true de novo mutations or call too many false ones, making downstream validation and analysis difficult. In particular, current approaches have much worse specificity than sensitivity, and developing effective filters to discriminate genuine from spurious de novo mutations remains an unsolved challenge.ResultsIn this article, we curated 59 sequence features in whole genome and exome alignment context which are considered to be relevant to discriminating true de novo mutations from artifacts, and then employed a machine-learning approach to classify candidates as true or false de novo mutations. Specifically, we built a classifier, named De Novo Mutation Filter (DNMFilter), using gradient boosting as the classification algorithm. We built the training set using experimentally validated true and false de novo mutations as well as collected false de novo mutations from an in-house large-scale exome-sequencing project. We evaluated DNMFilter's theoretical performance and investigated relative importance of different sequence features on the classification accuracy. Finally, we applied DNMFilter on our in-house whole exome trios and one CEU trio from the 1000 Genomes Project and found that DNMFilter could be coupled with commonly used de novo mutation detection approaches as an effective filtering approach to significantly reduce false discovery rate without sacrificing sensitivity.AvailabilityThe software DNMFilter implemented using a combination of Java and R is freely available from the website at http://humangenome.duke.edu/software.
Project description:We describe a multiple de novo CNV (MdnCNV) phenomenon in which individuals with genomic disorders carry five to ten constitutional de novo CNVs. Five such families are studied, which consists of four trios and one singleton. Various array platforms are used to interogate these families to identify de novo CNVs.
Project description:We describe a multiple de novo CNV (MdnCNV) phenomenon in which individuals with genomic disorders carry five to ten constitutional de novo CNVs. Five such families are studied, which consists of four trios and one singleton. Various array platforms are used to interogate these families to identify de novo CNVs.
Project description:We describe a multiple de novo CNV (MdnCNV) phenomenon in which individuals with genomic disorders carry five to ten constitutional de novo CNVs. Five such families are studied, which consists of four trios and one singleton. Various array platforms are used to interogate these families to identify de novo CNVs.
Project description:We describe a multiple de novo CNV (MdnCNV) phenomenon in which individuals with genomic disorders carry five to ten constitutional de novo CNVs. Five such families are studied, which consists of four trios and one singleton. Various array platforms are used to interogate these families to identify de novo CNVs.
Project description:In order to study parent-of-origin effects on gene expression, we performed RNAseq analysis (100bp single end reads) of 165 children who formed part of mother/father/child trios where genotype data was available from the HapMap and/or 1000 Genomes Projects. Based on phased genotypes at heterozygous SNP positions, we generated allelic counts for expression of the maternal and paternal alleles in each individual. This analysis reveals significant bias in the expression of the parental alleles for dozens of genes, including both previously known and novel imprinted transcripts.
Project description:BackgroundIn studies of case-parent trios, we define copy number variants (CNVs) in the offspring that differ from the parental copy numbers as de novo and of interest for their potential functional role in disease. Among the leading array-based methods for discovery of de novo CNVs in case-parent trios is the joint hidden Markov model (HMM) implemented in the PennCNV software. However, the computational demands of the joint HMM are substantial and the extent to which false positive identifications occur in case-parent trios has not been well described. We evaluate these issues in a study of oral cleft case-parent trios.ResultsOur analysis of the oral cleft trios reveals that genomic waves represent a substantial source of false positive identifications in the joint HMM, despite a wave-correction implementation in PennCNV. In addition, the noise of low-level summaries of relative copy number (log R ratios) is strongly associated with batch and correlated with the frequency of de novo CNV calls. Exploiting the trio design, we propose a univariate statistic for relative copy number referred to as the minimum distance that can reduce technical variation from probe effects and genomic waves. We use circular binary segmentation to segment the minimum distance and maximum a posteriori estimation to infer de novo CNVs from the segmented genome. Compared to PennCNV on simulated data, MinimumDistance identifies fewer false positives on average and is comparable to PennCNV with respect to false negatives. Genomic waves contribute to discordance of PennCNV and MinimumDistance for high coverage de novo calls, while highly concordant calls on chromosome 22 were validated by quantitative PCR. Computationally, MinimumDistance provides a nearly 8-fold increase in speed relative to the joint HMM in a study of oral cleft trios.ConclusionsOur results indicate that batch effects and genomic waves are important considerations for case-parent studies of de novo CNV, and that the minimum distance is an effective statistic for reducing technical variation contributing to false de novo discoveries. Coupled with segmentation and maximum a posteriori estimation, our algorithm compares favorably to the joint HMM with MinimumDistance being much faster.