Project description:This dataset was utilized to assess the performance of a novel de novo metaproteomics pipeline, which performs sequence alignment of de novo sequences from complete metaproteomics experiments. Traditionally, metaproteomics data annotation relies on database searching that requires sample-specific databases derived from whole metagenome sequencing experiments. Creating these databases, however, is a complex, time-consuming, and error prone process, which can introduce biases affecting the outcomes and conclusions, highlighting the need for alternative methods. The evaluated approach offers rapid and orthogonal insights into metaproteomics data.
2024-10-10 | PXD050548 | Pride
Project description:De novo metagenome whole genome assembly
Project description:For this manuscript, the Prochlorococcus MED4 strain shotgun proteome dataset was used for benchmarking a de novo-directed sequencing approach. De novo peptide sequencing, where the sequence of amino acids is determined directly from mass spectra rather than by comparison (or peptide spectrum matching) to a selected database. We perform a benchmarking experiment using Prochlorococcus culture data, demonstrating de novo peptides are sufficiently accurate and taxonomically specific to be useful in environmental studies. The MED4 dataset herein represents the output from peptide spectrum matching using COMET within the transproteomic pipeline (TPP). Additional MED4 data outside this manuscript are included for both trypsin and Glu-C protease digestions as well as TPP output for post-translational modification searches. De novo output data derived from Peaks Studio can be found by referencing the manuscript publication.
Project description:Background: The soil environment is responsible for sustaining most terrestrial plant life on earth, yet we know surprisingly little about the important functions carried out by diverse microbial communities in soil. Soil microbes that inhabit the channels of decaying root systems, the detritusphere, are likely to be essential for plant growth and health, as these channels are the preferred locations of new root growth. Understanding the microbial metagenome of the detritusphere and how it responds to agricultural management such as crop rotations and soil tillage will be vital for improving global food production. Methods: The rhizosphere soils of wheat and chickpea growing under + and - decaying root were collected for metagenomics sequencing. A gene catalogue was established by de novo assembling metagenomic sequencing. Genes abundance was compared between bulk soil and rhizosphere soils under different treatments. Conclusions: The study describes the diversity and functional capacity of a high-quality soil microbial metagenome. The results demonstrate the contribution of the microbiome from decaying root in determining the metagenome of developing root systems, which is fundamental to plant growth, since roots preferentially inhabit previous root channels. Modifications in root microbial function through soil management, can ultimately govern plant health, productivity and food security.
Project description:De novo peptide sequencing is a fundamental research area in mass spectrometry (MS) based proteomics. However, those methods have often been evaluated using a couple of simple metrics that do not fully reflect their overall performance. Moreover, there has not been an established method to estimate the false discovery rate (FDR) and the significance of de novo peptide-spectrum matches (PSMs). Here we propose NovoBoard, a comprehensive framework to evaluate the performance of de novo peptide sequencing methods. The framework consists of diverse benchmark datasets (including tryptic, nontryptic, immunopeptidomics, and different species), and a standard set of accuracy metrics to evaluate the fragment ions, amino acids, and peptides of the de novo results. More importantly, a new approach is designed to evaluate de novo peptide sequencing methods on target-decoy spectra and to estimate their FDRs. Our results thoroughly reveal the strengths and weaknesses of different de novo peptide sequencing methods, and how their performances depend on specific applications and the types of data. Our FDR estimation also shows that some tools may perform better than the others in distinguishing between de novo PSMs and random matches, and can be used to assess the significance of de novo PSMs.
Project description:DNA methylation plays a critical role in development, particularly in repressing retrotransposons. The mammalian methylation landscape is dependent on the combined activities of the canonical maintenance enzyme Dnmt1 and the de novo Dnmts, 3a and 3b. Here we demonstrate that Dnmt1 displays de novo methylation activity in vitro and in vivo with specific retrotransposon targeting. We used whole-genome bisulfite and long-read Nanopore sequencing in genetically engineered methylation depleted embryonic stem cells to provide an in-depth assessment and quantification of this activity. Utilizing additional knockout lines and molecular characterization, we show that Dnmt1's de novo methylation activity depends on Uhrf1 and its genomic recruitment overlaps with targets that enrich for Trim28 and H3K9 trimethylation. Our data demonstrate that Dnmt1 can de novo add and maintain DNA methylation, especially at retrotransposons and that this mechanism may provide additional stability for long-term repression and epigenetic propagation throughout development.
Project description:For decades, technical and cost hurdles have prevented the systematic investigation of non-coding sequences in complex human diseases, and thus our knowledge about autism spectrum disorders (ASD) has been primarily obtained from analysis of protein-coding sequences. We have combined the analysis of whole genome sequencing with global studies of regulatory sequences of human cortical neurons to reveal the regulatory architecture of ASD. Analysis of de novo mutations from whole genome sequencing of 261 autism families revealed the physical proximity of ASD de novo mutations specifically to the cortical expression quantitative loci (eQTLs) of synaptic genes. We performed ATAC-Seq, ChIP-Seq, RNA-Seq and Hi-C experiments on human cortical neurons, which for the first time provided a paranormal view of the regulatory landscape in these cells. We found that ASD de novo mutations preferentially affect regulatory elements, and the associated genes are shared targets of two ASD syndromic factors, CHD8 and PTEN. Analyzing 15 chromatin states across 127 human tissue/cell types revealed a significant enrichment of ASD de novo mutations in active transcription start sites and the perturbed genes implicated in neuron functions; this distribution enabled us to develop a machine-learning algorithm to assess potential ASD risk for a given individual. Taken together, our study for the first time revealed the regulatory landscape in human neurons, demonstrated the importance of the non-coding genome in ASD and provides a general framework for analyzing regulatory mutations for other complex human diseases.
Project description:Dependent on concise, pre-defined protein sequence databases, traditional search algorithms perform poorly when analyzing mass spectra derived from wholly uncharacterized protein products. Conversely, de novo peptide sequencing algorithms can interpret mass spectra without relying on reference databases. However, such algorithms have been difficult to apply to complex protein mixtures, in part due to a lack of methods for automatically validating de novo sequencing results. Here, we present novel metrics for benchmarking de novo sequencing algorithm performance on large scale proteomics datasets, and present a method for accurately calibrating false discovery rates on de novo results. We also present a novel algorithm (LADS) which leverages experimentally disambiguated fragmentation spectra to boost sequencing accuracy and sensitivity. LADS improves sequencing accuracy on longer peptides relative to other algorithms and improves discriminability of correct and incorrect sequences. Using these advancements, we demonstrate accurate de novo identification of peptide sequences not identifiable using database search-based approaches.