Unknown

Dataset Information

0

MSProGene: integrative proteogenomics beyond six-frames and single nucleotide polymorphisms.


ABSTRACT: Ongoing advances in high-throughput technologies have facilitated accurate proteomic measurements and provide a wealth of information on genomic and transcript level. In proteogenomics, this multi-omics data is combined to analyze unannotated organisms and to allow more accurate sample-specific predictions. Existing analysis methods still mainly depend on six-frame translations or reference protein databases that are extended by transcriptomic information or known single nucleotide polymorphisms (SNPs). However, six-frames introduce an artificial sixfold increase of the target database and SNP integration requires a suitable database summarizing results from previous experiments. We overcome these limitations by introducing MSProGene, a new method for integrative proteogenomic analysis based on customized RNA-Seq driven transcript databases. MSProGene is independent from existing reference databases or annotated SNPs and avoids large six-frame translated databases by constructing sample-specific transcripts. In addition, it creates a network combining RNA-Seq and peptide information that is optimized by a maximum-flow algorithm. It thereby also allows resolving the ambiguity of shared peptides for protein inference. We applied MSProGene on three datasets and show that it facilitates a database-independent reliable yet accurate prediction on gene and protein level and additionally identifies novel genes.MSProGene is written in Java and Python. It is open source and available at http://sourceforge.net/projects/msprogene/.

SUBMITTER: Zickmann F 

PROVIDER: S-EPMC4765881 | biostudies-literature | 2015 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

MSProGene: integrative proteogenomics beyond six-frames and single nucleotide polymorphisms.

Zickmann Franziska F   Renard Bernhard Y BY  

Bioinformatics (Oxford, England) 20150601 12


<h4>Unlabelled</h4>Ongoing advances in high-throughput technologies have facilitated accurate proteomic measurements and provide a wealth of information on genomic and transcript level. In proteogenomics, this multi-omics data is combined to analyze unannotated organisms and to allow more accurate sample-specific predictions. Existing analysis methods still mainly depend on six-frame translations or reference protein databases that are extended by transcriptomic information or known single nucle  ...[more]

Similar Datasets

| S-EPMC3214026 | biostudies-literature
| S-EPMC4937317 | biostudies-literature
| S-EPMC1462490 | biostudies-other
| S-EPMC7470747 | biostudies-literature
| S-EPMC4436670 | biostudies-literature
| S-EPMC5746410 | biostudies-other
| S-EPMC3453505 | biostudies-literature
| S-EPMC5755964 | biostudies-literature
| S-EPMC5459737 | biostudies-literature
| S-EPMC2992769 | biostudies-literature