A systematic evaluation of semispecific peptide search parameter enables identification of previously undescribed N-terminal peptides and conserved proteolytic processing in cancer cell lines
Ontology highlight
ABSTRACT: Background: Liquid chromatography- tandem mass spectrometry (LC-MS/MS) has become the most commonly used technique in explorative proteomic research. A variety of open-source tools for peptide-spectrum matching have become available. Most analyses of explorative MS data are performed using conventional settings, such as fully specific enzymatic constraints. Here we evaluated the impact of the fragment mass tolerance as well as the enzymatic constraints on the performance of three search engines.
Methods: The open-source search engines including Myrimatch, xTandem and MSGF+ were evaluated with regard to suitability for semi- and unspecific searches as well as the importance of accurate fragment mass spectra. Applying the most suited parameters we performed a semispecific reanalysis of the published NCI-60 deep proteome data.
Results: Semi- and unspecific LC-MS/MS data analyses particularly benefit from accurate fragment mass spectra while this effect is less pronounced for conventional, fully specific peptide-spectrum matching. Search speed differed notably between the three search engines with regard to semi- and non-specific peptide-spectrum matching. Semi-specific reanalysis of NCI-60 proteome data revealed hundreds of previously undescribed N-terminal peptides, including cases of proteolytic processing or likely alternative translation start sites, some of which were ubiquitously present in all cell lines of the reanalyzed panel.
Conclusions: Highly accurate MS2 fragment data in combination with modern open-source search algorithms facilitate the confident identification of semispecific peptides from large proteomic datasets. The identification of previously undescribed N-terminal peptides in published studies highlights the potential of future reanalysis and data mining in proteomic datasets.
The converted .mzML files as well as the sequence databases for the different biological samples are provided. The analysis results are provided as compressed folders containing the results for multiple searches for each .mzML, e.g. using different enzymatic constraints as well as different fragment mass tolerance settings. The NCI-60 raw data from nine representative cancer cell lines was retrieved from https://www.proteomicsdb.org/proteomicsdb/#projects/35/258 and converted to the open mzML format using msconvert using default settings with an additional "metadataFixer" filter. Here we provide the sequence database file as well as the complete reanalysis as compressed galaxy history files which can be downloaded, extracted and imported on https://usegalaxy.eu.
INSTRUMENT(S): Q Exactive Plus, LTQ Orbitrap Elite
ORGANISM(S): Escherichia Coli (ncbitaxon:562) Homo Sapiens (ncbitaxon:9606) Mus Musculus (ncbitaxon:10090)
SUBMITTER: Oliver Schilling
PROVIDER: MSV000087034 | MassIVE | Thu Mar 11 03:41:00 GMT 2021
SECONDARY ACCESSION(S): PXD024676
REPOSITORIES: MassIVE
ACCESS DATA