Unknown

Dataset Information

0

PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq.


ABSTRACT: Peptide identification based upon mass spectrometry (MS) is generally achieved by comparison of the experimental mass spectra with the theoretically digested peptides derived from a reference protein database. Obviously, this strategy could not identify peptide and protein sequences that are absent from a reference database. A customized protein database on the basis of RNA-Seq data is thus proposed to assist with and improve the identification of novel peptides. Correspondingly, development of a comprehensive pipeline, which provides an end-to-end solution for novel peptide detection with the customized protein database, is necessary.A pipeline with an R package, assigned as a PGA utility, was developed that enables automated treatment to the tandem mass spectrometry (MS/MS) data acquired from different MS platforms and construction of customized protein databases based on RNA-Seq data with or without a reference genome guide. Hence, PGA can identify novel peptides and generate an HTML-based report with a visualized interface. On the basis of a published dataset, PGA was employed to identify peptides, resulting in 636 novel peptides, including 510 single amino acid polymorphism (SAP) peptides, 2 INDEL peptides, 49 splice junction peptides, and 75 novel transcript-derived peptides. The software is freely available from http://bioconductor.org/packages/PGA/ , and the example reports are available at http://wenbostar.github.io/PGA/ .The pipeline of PGA, aimed at being platform-independent and easy-to-use, was successfully developed and shown to be capable of identifying novel peptides by searching the customized protein database derived from RNA-Seq data.

SUBMITTER: Wen B 

PROVIDER: S-EPMC4912784 | biostudies-literature | 2016 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq.

Wen Bo B   Xu Shaohang S   Zhou Ruo R   Zhang Bing B   Wang Xiaojing X   Liu Xin X   Xu Xun X   Liu Siqi S  

BMC bioinformatics 20160617 1


<h4>Background</h4>Peptide identification based upon mass spectrometry (MS) is generally achieved by comparison of the experimental mass spectra with the theoretically digested peptides derived from a reference protein database. Obviously, this strategy could not identify peptide and protein sequences that are absent from a reference database. A customized protein database on the basis of RNA-Seq data is thus proposed to assist with and improve the identification of novel peptides. Corresponding  ...[more]

Similar Datasets

| S-EPMC4155246 | biostudies-literature
| S-EPMC3727138 | biostudies-literature
| S-EPMC6567655 | biostudies-literature
| S-EPMC6404334 | biostudies-literature
| S-EPMC8697502 | biostudies-literature
| S-EPMC4918025 | biostudies-other
| S-EPMC11348166 | biostudies-literature
| S-EPMC4194139 | biostudies-literature
| S-EPMC3842753 | biostudies-literature
| S-EPMC4595899 | biostudies-literature