Dataset Information

Open-pFind enables precise, comprehensive and rapid peptide identification in shotgun proteomics, part 2

ABSTRACT: Shotgun proteomics has grown rapidly in recent decades, but a large fraction of tandem mass spectrometry (MS/MS) data in shotgun proteomics are not successfully identified. We have developed a novel database search algorithm, Open-pFind, to efficiently identify peptides even in an ultra-large search space which takes into account unexpected modifications, amino acid mutations, semi- or non-specific digestion and co-eluting peptides. Tested on two metabolically labeled MS/MS datasets, Open-pFind reported 50.5‒117.0% more peptide-spectrum matches (PSMs) than the seven other advanced algorithms. More importantly, the Open-pFind results were more credible judged by the verification experiments using stable isotopic labeling. Tested on four additional large-scale datasets, 70‒85% of the spectra were confidently identified, and high-quality spectra were nearly completely interpreted by Open-pFind. Further, Open-pFind was over 40 times faster than the other three open search algorithms and 2‒3 times faster than three restricted search algorithms. Re-analysis of an entire human proteome dataset consisting of ~25 million spectra using Open-pFind identified a total of 14,064 proteins encoded by 12,723 genes by requiring at least two uniquely identified peptides. In this search results, Open-pFind also excelled in an independent test for false positives based on the presence or absence of olfactory receptors. Thus, a practical use of the open search strategy has been realized by Open-pFind for the truly global-scale proteomics experiments of today and in the future.

INSTRUMENT(S):

ORGANISM(S): Saccharomyces Cerevisiae (baker's Yeast)

SUBMITTER: Hao Chi

LAB HEAD: Si-Min He

PROVIDER: PXD008783 | Pride | 2018-07-11

REPOSITORIES: Pride

ACCESS DATA

Similar Datasets

Project description:In recent years, high-throughput technologies have contributed to development a more precise picture of the human proteome. However, 2,129 proteins remain listed as missing proteins (MPs) in the newest neXtProt release (2019-02). The main reasons for MPs are a low abundance, low-molecular-weight (LMW), unexpected modifications, membrane characteristics, etc. Moreover, more than 50% of the MS/MS data have not been successfully identified in shotgun proteomics. Open-pFind, an efficient open search engine, recently released by the pFind group in China, presents an opportunity to identify these buried MPs in complex samples. Proteins and potential MPs were identified using Open-pFind and three other search engines to compare their performance and efficiency with three large-scale datasets digested by different enzymes. Our results demonstrated that Open-pFind identified 29.9-47.5% more peptide-spectrum matches (PSMs), 48.0-63.9% more peptides sequences (with modifications) and 22.7-38.1% more peptide sequences (regardless of modifications) than those identified by the second-best search engine. As a result, Open-pFind detected 7.5-19.3% more candidate MPs than those by the second-best search engine. In total, 5 (PE2) of the 150 candidate MPs identified by Open-pFind were verified from two unique peptides containing more than 9 amino acids (AA) by using spectrum theoretical prediction with pDeep, and synthesized peptide matching with pBuild, after spectrum quality analysis, isobaric post-translational modification, and single amino acid variant (SAAV) filtering. These five verified MPs can be ranked in the PE1 level. In addition, three other candidate MPs were verified with two unique peptides (one peptide containing more than 9 AA) and the other containing only 8 AA), which were slightly lower than the criteria listed by C-HPP, and required additional verification information. More importantly, unexpected modifications were detected in these MPs. Another 141 MPs were listed as candidates, but required additional verification information.

			Action	DRS
	10.raw	Raw
	14.raw	Raw
	21.raw	Raw
	22.raw	Raw
	23.raw	Raw

Dataset Information

Open-pFind enables precise, comprehensive and rapid peptide identification in shotgun proteomics, part 2

Dataset's files

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets