Project description:De novo peptide sequencing is a fundamental research area in mass spectrometry (MS) based proteomics. However, those methods have often been evaluated using a couple of simple metrics that do not fully reflect their overall performance. Moreover, there has not been an established method to estimate the false discovery rate (FDR) and the significance of de novo peptide-spectrum matches (PSMs). Here we propose NovoBoard, a comprehensive framework to evaluate the performance of de novo peptide sequencing methods. The framework consists of diverse benchmark datasets (including tryptic, nontryptic, immunopeptidomics, and different species), and a standard set of accuracy metrics to evaluate the fragment ions, amino acids, and peptides of the de novo results. More importantly, a new approach is designed to evaluate de novo peptide sequencing methods on target-decoy spectra and to estimate their FDRs. Our results thoroughly reveal the strengths and weaknesses of different de novo peptide sequencing methods, and how their performances depend on specific applications and the types of data. Our FDR estimation also shows that some tools may perform better than the others in distinguishing between de novo PSMs and random matches, and can be used to assess the significance of de novo PSMs.
2024-08-28 | PXD055277 | Pride
Project description:De novo transcriptome assemblies of three Darevskia lizard species
Project description:Predicted peptides for the 9-species de novo sequencing benchmark MSV000090982 as described in Yilmaz et al. [Yilmaz2023]. FTP directory contains outputs of 5 de novo peptide sequencing methods on the 9-species benchmark: Casanovo, Casanovo_bm (benchmark), PointNovo, DeepNovo and Novor. [Yilmaz2023] M. Yilmaz*, W. Fondrie*, W. Bittremieux*, R. Nelson, V. Ananth, S. Oh, and W. Noble,"Sequence-to-sequence translation from mass spectra to peptides with a transformer model", bioRxiv, 2023
Project description:We generated a protein database directly from soil metaproteomic data by identifying the microbial composition using the Kaiko model's de novo sequencing methods. We first analyzed the mass spectra de novo (without a database), identifying species from the observed peptides. We next gathered full proteomic databases for the identified species and searched the mass spec data using MS-GF+ and this custom-assembled protein sequence database.
2020-10-20 | MSV000086336 | MassIVE
Project description:Three de novo Lamiaceae genomes
Project description:Predicted peptides for the 9-species de novo sequencing benchmark MSV000090982 as described in Yilmaz et al. [Yilmaz2023]. FTP directory contains outputs of 5 de novo peptide sequencing methods on the 9-species benchmark: Casanovo, Casanovo_bm (benchmark), PointNovo, DeepNovo and Novor. Output files for Casanovo contain scan numbers and run names to allow matching to spectra files. [Yilmaz2023] M. Yilmaz*, W. Fondrie*, W. Bittremieux*, R. Nelson, V. Ananth, S. Oh, and W. Noble,"Sequence-to-sequence translation from mass spectra to peptides with a transformer model", bioRxiv, 2023
Project description:We describe an application of deep sequencing and de novo assembly of short RNA reads to investigate small interfering (si)RNAs mediated immunity in leaf samples from eight tree taxa naturally occurring in Wytham Woods, Oxfordshire, UK. BLAST search for homologues of contigs in the GenBank identified siRNA populations against a number of RNA viruses and a Ty1-copia retrotransposons in these tree species. Small RNA sequencing and de novo assembly
Project description:Dependent on concise, pre-defined protein sequence databases, traditional search algorithms perform poorly when analyzing mass spectra derived from wholly uncharacterized protein products. Conversely, de novo peptide sequencing algorithms can interpret mass spectra without relying on reference databases. However, such algorithms have been difficult to apply to complex protein mixtures, in part due to a lack of methods for automatically validating de novo sequencing results. Here, we present novel metrics for benchmarking de novo sequencing algorithm performance on large scale proteomics datasets, and present a method for accurately calibrating false discovery rates on de novo results. We also present a novel algorithm (LADS) which leverages experimentally disambiguated fragmentation spectra to boost sequencing accuracy and sensitivity. LADS improves sequencing accuracy on longer peptides relative to other algorithms and improves discriminability of correct and incorrect sequences. Using these advancements, we demonstrate accurate de novo identification of peptide sequences not identifiable using database search-based approaches.
Project description:Precision de novo peptide sequencing using mirror proteases of Ac-LysargiNase and trypsin for large-scale proteomicsPrecision de novo peptide sequencing using mirror proteases of Ac-LysargiNase and trypsin for large-scale proteomics