Project description:In recent years, Next Generation Sequencing (NGS) has become a cornerstone of clinical genetics and diagnostics. Many clinical applications require high precision, especially if rare events such as somatic mutations in cancer or genetic variants causing rare diseases need to be identified. Although random sequencing errors can be modeled statistically and deep sequencing minimizes their impact, systematic errors remain a problem even at high depth of coverage. Understanding their source is crucial to increase precision of clinical NGS applications. In this work, we studied the relation between recurrent biases in allele balance (AB), systematic errors and false positive variant calls across a large cohort of human samples analyzed by whole exome sequencing (WES). We have modeled the allele balance distribution for biallelic genotypes in 987 WES samples in order to identify positions recurrently deviating significantly from the expectation, a phenomenon we termed allele balance bias (ABB). Furthermore, we have developed a genotype callability score based on ABB for all positions of the human exome, which detects false positive variant calls that passed state-of-the-art filters. Finally, we demonstrate the use of ABB for detection of false associations proposed by rare variant association studies (RVAS).
Project description:BACKGROUND: Pre-implantation genetic screening (PGS) has been used in an attempt to determine embryonic aneuploidy. Techniques that use new molecular methods to determine the karyotype of an embryo are expanding the scope of PGS. METHODS: We introduce a new method for PGS, termed “Parental Support” (PS), which leverages microarray measurements from parental DNA to “clean” single cell microarray measurements on embryonic cells and explicitly computes confidence in each copy number call. The method distinguishes mitotic and meiotic copy errors, and determines parental source of aneuploidy. RESULTS: Validation with 459 single cells of known karyotype indicated that per-cell false positive and false negative rates are roughly equivalent to the “gold standard” metaphase karyotype. The majority of the cells were run in parallel with a clinical commercial PGS service. Computed confidences were conservative and roughly concordant with accuracy. To examine ploidy in human embryos, the method was then applied to 26 disaggregated cryopreserved cleavage stage embryos, for a total of 134 single blastomeres. Only 23.1% of the embryos were euploid, though 46.2% of embryos were mosaic euploid. Mosaicism affected 57.7% of the embryos. Counts of mitotic and meiotic errors were roughly equivalent. Maternal meiotic trisomy predominated over paternal trisomy, and maternal meiotic trisomies were negatively predictive of mosaic euploid embryos. CONCLUSIONS: We have performed a major preclinical validation of a new method for PGS and found that the technology performs approximately as well as a metaphase karyotype. We also directly measured the mechanism of aneuploidy in cleavage stage human embryos and found high rates and distinct patterns of mitotic and meiotic aneuploidy.
Project description:Purpose: Next-generation sequencing (NGS) has revolutionized systems-based analysis of cellular pathways. The goals of this study are using mRNA and microRNA sequencing analysis to evaluate the effects of EBV-encoded microRNA BART2-5p mimics on the mRNA and microRNA profiles of human nasopharyngeal carcinoma cell line 6-10B. Methods: 6-10B cells were transfected with either EBV microRNA BART2-5p mimics or negative control (NC) mimics for 48 hours. Cellular RNA was then sequenced at the mRNA and microRNA levels using illumina high-throughput sequencing service (Oebiotech, Shanghai, China). Data were extracted and normalized according to the manufacturer's standard protocol. Results: Log-fold changes of up- or down-regulated mRNAs and microRNAs between the control and experiment group were selected with a significance threshold of p<0.05. Conclusions: Our study describes the mRNA and microRNA alterations induced by EBV-miRNA-BART2-5p in 6-10B cells
Project description:Most proteogenomic approaches for mapping single amino acid polymorphisms (SAPs) require construction of a sample-specific database containing protein variants predicted from the next-generation sequencing (NGS) data. We present a new strategy for direct SAP detection without relying on NGS data. Among the 348 putative SAP peptides identified in an industrial yeast strain, 85.6% of SAP sites were validated by genomic sequencing.
Project description:Logistic regression classification models were fit to manually classified quality control (QC) LC-MS/MS datasets to develop a model that can predict whether a dataset is in or out of control. Model parameters were optimized by minimizing a loss function that accounts for the tradeoff between false positive and false negative errors. In addition to the 1152 training/testing datasets, we are including 2662 additional datasets, all of the same QC sample (whole cell lysate of Shewanella oneidensis). Datasets originate from 6 Thermo instrument platforms: Exactive, LTQ, VelosPro, Orbitrap, Q-Exactive, and Velos Orbitrap.
Project description:Logistic regression classification models were fit to manually classified quality control (QC) LC-MS/MS datasets to develop a model that can predict whether a dataset is in or out of control. Model parameters were optimized by minimizing a loss function that accounts for the tradeoff between false positive and false negative errors. In addition to the 1152 training/testing datasets, we are including 2662 additional datasets, all of the same QC sample (whole cell lysate of Shewanella oneidensis). Datasets originate from 6 Thermo instrument platforms: Exactive, LTQ, VelosPro, Orbitrap, Q-Exactive, and Velos Orbitrap.
Project description:Logistic regression classification models were fit to manually classified quality control (QC) LC-MS/MS datasets to develop a model that can predict whether a dataset is in or out of control. Model parameters were optimized by minimizing a loss function that accounts for the tradeoff between false positive and false negative errors. In addition to the 1152 training/testing datasets, we are including 2662 additional datasets, all of the same QC sample (whole cell lysate of Shewanella oneidensis). Datasets originate from 6 Thermo instrument platforms: Exactive, LTQ, VelosPro, Orbitrap, Q-Exactive, and Velos Orbitrap.
Project description:Logistic regression classification models were fit to manually classified quality control (QC) LC-MS/MS datasets to develop a model that can predict whether a dataset is in or out of control. Model parameters were optimized by minimizing a loss function that accounts for the tradeoff between false positive and false negative errors. In addition to the 1152 training/testing datasets, we are including 2662 additional datasets, all of the same QC sample (whole cell lysate of Shewanella oneidensis). Datasets originate from 6 Thermo instrument platforms: Exactive, LTQ, VelosPro, Orbitrap, Q-Exactive, and Velos Orbitrap.
Project description:Logistic regression classification models were fit to manually classified quality control (QC) LC-MS/MS datasets to develop a model that can predict whether a dataset is in or out of control. Model parameters were optimized by minimizing a loss function that accounts for the tradeoff between false positive and false negative errors. In addition to the 1152 training/testing datasets, we are including 2662 additional datasets, all of the same QC sample (whole cell lysate of Shewanella oneidensis). Datasets originate from 6 Thermo instrument platforms: Exactive, LTQ, VelosPro, Orbitrap, Q-Exactive, and Velos Orbitrap.