Proteomics

Dataset Information

0

De novo nine-species benchmark


ABSTRACT: We created a new version of the nine-species benchmark originally described by Tran et al. [Tran2017]. To do so, we downloaded the RAW files from the same nine PRIDE projects (PXD005025, PXD004948, PXD004325, PXD004565, PXD004536, PXD004947, PXD003868, PXD004467, PXD004424) and converted them to MGF format using ThermoRawFileParser v1.3.4. We also downloaded the corresponding nine Uniprot reference proteomes and constructed a Tide index for each one, using Crux version 4.1. For one species (Vigna mungo), no reference proteome is available, so we used the proteome of the closely related species Vigna radiata. We allowed for the following variable modifications: Met oxidation, Asn deamidation, Gln deamidation, N-term acetylation, N-term carbamylation, N-term NH3 loss, and the combination of N-term carbamylation and NH3 loss by using the tide-index options "--mods-spec 1M+15.994915, 1N+0.984016, 1Q+0.984016 --nterm-peptide-mods-spec 1X+42.010565, 1X+43.005814, 1X-17.026549, 1X+25.980265 --max-mods 3". Note that one of the nine experiments (Mus musculus) was performed using SILAC labeling, but we searched without SILAC modifications and hence include in the benchmark only PSMs from unlabeled peptides. Each index also contains a shuffled decoy peptide corresponding to each target peptide. Each MGF file was searched against the corresponding index using the precursor window size and fragment bin tolerance specified in the original study. We used XCorr scoring with Tailor calibration, and we allowed for 1 isotope error in the selection of candidate peptides. All search results were then analyzed jointly per species using the Crux implementation of Percolator, with default parameters. For the benchmark, we retained all PSMs with Percolator q value < 0.01. We identified 13 MGF files with fewer than 100 accepted PSMs, and we eliminated all of these PSMs from the benchmark. We then post-processed the PSMs to eliminate peptides that are shared between species. Among the 229,984 unique peptides, we identified 3797 (1.7%) that occur in more than one species. For each such peptide, we selected one of the associated species at random and then eliminated all PSMs containing that peptide in other species. The final benchmark dataset consists of 2.8 million PSMs drawn from 343 RAW files, exported as annotated MGF files. Note: the initial data submission contained annotated MGF files without considering the N-terminal modifications mentioned above. The update available in the `/MSV000090982/updates/2024-05-14_woutb_71950b89/peak/9speciesbenchmark` FTP directory contains the corrected MGF files that are directly compatible with Casanovo. [Tran2017] Tran, N. H., Zhang, X., Xin, L., Shan, B. & Li, M. De novo peptide sequencing by deep learning. Proceedings of the National Academy of Sciences of the United States of America 31, 8247-8252 (2017).

INSTRUMENT(S): Q Exactive

ORGANISM(S): Bacillus Subtilis (ncbitaxon:1423) Candidatus Thiodiazotropha Endoloripes (ncbitaxon:1818881) Vigna Mungo (ncbitaxon:3915) Solanum Lycopersicum (ncbitaxon:4081) Saccharomyces Cerevisiae (ncbitaxon:4932) Homo Sapiens (ncbitaxon:9606) Mus Musculus (ncbitaxon:10090) Methanosarcina Mazei (ncbitaxon:2209) Apis Mellifera (ncbitaxon:7460)

SUBMITTER: William Stafford Noble  

PROVIDER: MSV000090982 | MassIVE | Mon Jan 02 03:05:00 GMT 2023

REPOSITORIES: MassIVE

Dataset's files

Source:
Action DRS
Other
Items per page:
1 - 1 of 1

Similar Datasets

2024-08-28 | PXD055277 | Pride
2013-11-29 | PXD000131 | Pride
2020-01-03 | GSE142838 | GEO
2014-07-01 | E-GEOD-58949 | biostudies-arrayexpress
2015-11-05 | PXD000299 | Pride
2024-03-28 | MSV000094434 | MassIVE
2013-05-05 | PXD000138 | Pride
2024-10-23 | MSV000096182 | MassIVE
2013-08-13 | PXD000264 | Pride
2023-06-24 | PXD043262 |