Proteomics

Dataset Information

0

CCPRD: A novel analytical framework for comprehensive proteomic reference database construction of non-model organisms


ABSTRACT: Protein reference databases are a critical part of producing efficient proteomic analyses. However, the method for constructing clean, efficient, and comprehensive protein reference databases is lacking. Existing methods either do not have contamination control procedures, or these methods rely on a three-frame and/or six-frame translation that sharply increases the search space and harms MS results. Herein we propose a framework for constructing a customized comprehensive proteomic reference database (CCPRD) from draft genomes and deep sequencing transcriptomes. Its effectiveness is demonstrated by incorporating the proteomes of nematocysts from endoparasitic cnidarian: myxozoans. By applying customized contamination removal procedures, contaminations in omic data were successfully identified and removed. This is an effective method that does not result in over-decontamination. This can be shown by comparing the CCPRD MS results with an artificially-contaminated database and another database with removed contaminations in genomes and transcriptomes added back. CCPRD outperformed traditional frame-based methods by identifying 35.2%-50.7% more peptides and 35.8%-43.8% more proteins, with a maximum 84.6% in size reduction. A BUSCO analysis showed that the CCPRD maintained a relatively high level of completeness compared to traditional methods. These results confirm the superiority of the CCPRD over existing methods in peptide and protein identification numbers, database size, and completeness. By providing a general framework for generating the reference database, the CCPRD, which does not need a high-quality genome, can potentially be applied to any organisms and significantly contribute to proteomic research.

INSTRUMENT(S): Q Exactive HF

ORGANISM(S): Thelohanellus Kitauei Myxobolus Honghuensis Myxobolus Wulii

TISSUE(S): Nematocyst

SUBMITTER: qingxiang Guo  

LAB HEAD: Zemao Gu

PROVIDER: PXD018851 | Pride | 2020-07-09

REPOSITORIES: Pride

Dataset's files

Source:
Action DRS
H_1_all_6_frame_txt.rar Other
H_2_CCPRD_txt.rar Other
H_3_CCPRD_contam_txt.rar Other
H_4_CCPRD_remove_txt.rar Other
H_5_trans_6_frame_txt.rar Other
Items per page:
1 - 5 of 18
altmetric image

Publications

CCPRD: A Novel Analytical Framework for the Comprehensive Proteomic Reference Database Construction of NonModel Organisms.

Guo Qingxiang Q   Li Dan D   Zhai Yanhua Y   Gu Zemao Z  

ACS omega 20200617 25


Protein reference databases are a critical part of producing efficient proteomic analyses. However, the method for constructing clean, efficient, and comprehensive protein reference databases of nonmodel organisms is lacking. Existing methods either do not have contamination control procedures, or these methods rely on a three-frame and/or six-frame translation that sharply increases the search space and the need for computational resources. Herein, we propose a framework for constructing a cust  ...[more]

Similar Datasets

2011-01-12 | E-TABM-1209 | biostudies-arrayexpress
2008-04-06 | E-GEOD-8404 | biostudies-arrayexpress
2011-10-14 | E-GEOD-28630 | biostudies-arrayexpress
2017-04-21 | PXD005924 | Pride
2020-07-22 | E-MTAB-9350 | biostudies-arrayexpress
2014-07-30 | E-GEOD-41032 | biostudies-arrayexpress
2007-06-01 | E-MEXP-753 | biostudies-arrayexpress
2008-07-17 | E-GEOD-11234 | biostudies-arrayexpress
2012-01-07 | E-GEOD-33484 | biostudies-arrayexpress
2015-02-27 | E-MTAB-3290 | biostudies-arrayexpress