Dataset Information

A systematic evaluation of semispecific peptide search parameter enables identification of previously undescribed N-terminal peptides and conserved proteolytic processing in cancer cell lines

ABSTRACT: Background: Liquid chromatography- tandem mass spectrometry (LC-MS/MS) has become the most commonly used technique in explorative proteomic research. A variety of open-source tools for peptide-spectrum matching have become available. Most analyses of explorative MS data are performed using conventional settings, such as fully specific enzymatic constraints. Here we evaluated the impact of the fragment mass tolerance as well as the enzymatic constraints on the performance of three search engines. Methods: The open-source search engines including Myrimatch, xTandem and MSGF+ were evaluated with regard to suitability for semi- and unspecific searches as well as the importance of accurate fragment mass spectra. Applying the most suited parameters we performed a semispecific reanalysis of the published NCI-60 deep proteome data. Results: Semi- and unspecific LC-MS/MS data analyses particularly benefit from accurate fragment mass spectra while this effect is less pronounced for conventional, fully specific peptide-spectrum matching. Search speed differed notably between the three search engines with regard to semi- and non-specific peptide-spectrum matching. Semi-specific reanalysis of NCI-60 proteome data revealed hundreds of previously undescribed N-terminal peptides, including cases of proteolytic processing or likely alternative translation start sites, some of which were ubiquitously present in all cell lines of the reanalyzed panel. Conclusions: Highly accurate MS2 fragment data in combination with modern open-source search algorithms facilitate the confident identification of semispecific peptides from large proteomic datasets. The identification of previously undescribed N-terminal peptides in published studies highlights the potential of future reanalysis and data mining in proteomic datasets. The converted .mzML files as well as the sequence databases for the different biological samples are provided. The analysis results are provided as compressed folders containing the results for multiple searches for each .mzML, e.g. using different enzymatic constraints as well as different fragment mass tolerance settings. The NCI-60 raw data from nine representative cancer cell lines was retrieved from https://www.proteomicsdb.org/proteomicsdb/#projects/35/258 and converted to the open mzML format using msconvert using default settings with an additional "metadataFixer" filter. Here we provide the sequence database file as well as the complete reanalysis as compressed galaxy history files which can be downloaded, extracted and imported on https://usegalaxy.eu.

INSTRUMENT(S): Q Exactive Plus, LTQ Orbitrap Elite

ORGANISM(S): Escherichia Coli (ncbitaxon:562) Homo Sapiens (ncbitaxon:9606) Mus Musculus (ncbitaxon:10090)

SUBMITTER: Oliver Schilling

PROVIDER: MSV000087034 | MassIVE | Thu Mar 11 03:41:00 GMT 2021

SECONDARY ACCESSION(S): PXD024676

REPOSITORIES: MassIVE

ACCESS DATA

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Publications

A Systematic Evaluation of Semispecific Peptide Search Parameter Enables Identification of Previously Undescribed N-Terminal Peptides and Conserved Proteolytic Processing in Cancer Cell Lines.

Fahrner Matthias M Kook Lucas L Fröhlich Klemens K Biniossek Martin L ML Schilling Oliver O

Proteomes 20210525 2

Liquid chromatography-tandem mass spectrometry (LC-MS/MS) has become the most commonly used technique in explorative proteomic research. A variety of open-source tools for peptide-spectrum matching have become available. Most analyses of explorative MS data are performed using conventional settings, such as fully specific enzymatic constraints. Here we evaluated the impact of the fragment mass tolerance in combination with the enzymatic constraints on the performance of three search engines. Thr ...[more]

PMID: 34070654

Similar Datasets

Project description:Shotgun and positional proteomics study of a mouse embryonic stem cell line. We devised a proteogenomic approach constructing a custom protein sequence search space, built from both SwissProt and RIBO-seq derived translation products, applicable for LC-MSMS spectrum identification. To record the impact of using the constructed deep proteome database we performed two alternative MS-based proteomic strategies: (I) a regular shotgun proteomic and (II) an N-terminal COFRADIC approach. The obtained fragmentation spectra were searched against the custom database (combination of UniProtKB-SwissProt and RIBO-seq derived translation sequences) using three different search engines: OMSSA (version 2.1.9), X!Tandem (TORNADO, version 2010.01.01.04) and Mascot (version 2.3). The first two were run from the SearchGUI graphical user interface (version 1.10.4). A combination of X!Tandem and Mascot was used for the N-terminal COFRADIC analysis, a combination of all three search engines for the shotgun proteome analysis. Note that OMMSA cannot cope with the protease setting semi-ArgC/P needed to analyze N-terminal COFRADIC data.For the shotgun proteome data, trypsin was set as cleavage enzyme allowing for one missed cleavage, and singly to triply charged precursors or singly to quadruple charged precursors were taken into account respectively for the Mascot or X!Tandem/OMSSA search engines, and the precursor and fragment mass tolerance were set to respectively 10 ppm and 0.5 Da. Methionine oxidation to methionine-sulfoxide, pyroglutamate formation of N-terminal glutamine and acetylation (protein N-terminus) were set as variable modifications. For the N-terminal COFRADIC analysis the protease setting semi-ArgC/P (Arg-C specificity with arginine-proline cleavage allowed) was used. No missed cleavages were allowed and the precursor and fragment mass tolerance were also set to respectively 10 ppm and 0.5 Da. Carbamidomethylation of cysteine and methionine oxidation to methionine-sulfoxide and 13C3D2-acetylation of lysines were set as fixed modifications. Peptide N-terminal acetylation or 13C3D2-acetylation and pyroglutamate formation of N-terminal glutamine were set as variable modifications and instrument setting was put on ESI-TRAP. Protein and peptide identification in addition to data interpretation was done using the PeptideShaker algorithm (http://code.google.com/p/peptide-shaker, version 0.18.3), setting the false discovery rate to 1% at all levels (protein, peptide, and peptide to spectrum matching). Aforementioned tools and algorithms (SearchGui, X!Tandem, OMSSA, and PeptideShaker) are freely available as open source.

Project description:In recent years, high-throughput technologies have contributed to development a more precise picture of the human proteome. However, 2,129 proteins remain listed as missing proteins (MPs) in the newest neXtProt release (2019-02). The main reasons for MPs are a low abundance, low-molecular-weight (LMW), unexpected modifications, membrane characteristics, etc. Moreover, more than 50% of the MS/MS data have not been successfully identified in shotgun proteomics. Open-pFind, an efficient open search engine, recently released by the pFind group in China, presents an opportunity to identify these buried MPs in complex samples. Proteins and potential MPs were identified using Open-pFind and three other search engines to compare their performance and efficiency with three large-scale datasets digested by different enzymes. Our results demonstrated that Open-pFind identified 29.9-47.5% more peptide-spectrum matches (PSMs), 48.0-63.9% more peptides sequences (with modifications) and 22.7-38.1% more peptide sequences (regardless of modifications) than those identified by the second-best search engine. As a result, Open-pFind detected 7.5-19.3% more candidate MPs than those by the second-best search engine. In total, 5 (PE2) of the 150 candidate MPs identified by Open-pFind were verified from two unique peptides containing more than 9 amino acids (AA) by using spectrum theoretical prediction with pDeep, and synthesized peptide matching with pBuild, after spectrum quality analysis, isobaric post-translational modification, and single amino acid variant (SAAV) filtering. These five verified MPs can be ranked in the PE1 level. In addition, three other candidate MPs were verified with two unique peptides (one peptide containing more than 9 AA) and the other containing only 8 AA), which were slightly lower than the criteria listed by C-HPP, and required additional verification information. More importantly, unexpected modifications were detected in these MPs. Another 141 MPs were listed as candidates, but required additional verification information.