Dataset Information

Generation of ENSEMBL-based proteogenomics databases boost the identification of novel peptides

ABSTRACT: A novel bioinformatics tool pypgatk and the pgdb workflow is presented in study to create proteogenomics databases based on ENSEMBL resources. The tools allow the generation of protein sequences from novel protein-coding transcripts by performing a three-frame translation of pseudogenes, lncRNAs, and other non-canonical transcripts, such as those produced by alternative splicing events. It also includes exonic out-of-frame translation from otherwise canonical protein-coding mRNAs. Moreover, the tool enables the generation of variant protein sequences from multiple sources of genomic variants including COSMIC, cBioportal, gnomAD, and mutations detected from sequencing of patient samples. pypgatk and pgdb provide multiple functionalities for database handling, notably optimized target/decoy generation by the algorithm DecoyPyrat. Finally, we perform a reanalysis of four public datasets in PRIDE by generating cell-type specific databases for 65 cell lines using the pypgatk and pgdb workflow, revealing a wealth of non-canonical or cryptic peptides amounting to more than 10% of the total number of peptides identified (43,501 out of 402,512).

INSTRUMENT(S): Q Exactive HF, Q Exactive

ORGANISM(S): Homo Sapiens (human)

TISSUE(S): Lung

DISEASE(S): Lung Adenocarcinoma

SUBMITTER: Yasset Perez-Riverol

LAB HEAD: Yasset Perez-Riverol

PROVIDER: PXD029360 | Pride | 2021-10-26

REPOSITORIES: Pride

ACCESS DATA

Dataset's files

Source:

			Action	DRS
	000228_A01_P001360_B00A_A00_R1.mzML.gz	Mzml
	000228_A02_P001360_B00I_A00_R1.mzML.gz	Mzml
	000228_A03_P001359_B00E_A00_R1.mzML.gz	Mzml
	000228_A04_P001358_B00A_A00_R1.mzML.gz	Mzml
	000228_A05_P001358_B00I_A00_R1.mzML.gz	Mzml

Items per page:

1 - 5 of 1138

Publications

Generation of ENSEMBL-based proteogenomics databases boosts the identification of non-canonical peptides.

Umer Husen M HM Audain Enrique E Zhu Yafeng Y Pfeuffer Julianus J Sachsenberg Timo T Lehtiö Janne J Branca Rui M RM Perez-Riverol Yasset Y

Bioinformatics (Oxford, England) 20220201 5

<h4>Summary</h4>We have implemented the pypgatk package and the pgdb workflow to create proteogenomics databases based on ENSEMBL resources. The tools allow the generation of protein sequences from novel protein-coding transcripts by performing a three-frame translation of pseudogenes, lncRNAs and other non-canonical transcripts, such as those produced by alternative splicing events. It also includes exonic out-of-frame translation from otherwise canonical protein-coding mRNAs. Moreover, the too ...[more]

PMID: 34904638

Dataset Information

Generation of ENSEMBL-based proteogenomics databases boost the identification of novel peptides

Dataset's files

Publications

Generation of ENSEMBL-based proteogenomics databases boosts the identification of non-canonical peptides.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Generation of ENSEMBL-based proteogenomics databases boost the identification of novel peptides - Mouse dataset
2021-10-26 | PXD029362 | Pride

Discovery of Non-Canonical Peptides Derived from Novel Small Open Reading Frames as MHC-I Epitopes
2021-11-03 | PXD024415 | Pride

Genome-wide identification of Drought-responsive Regulatory Coding and Non-coding Transcripts from Oryza sativa L. by deep RNA sequencing
2016-09-06 | E-GEOD-74465 | biostudies-arrayexpress

ChIP-seq in the cell line NALM-6 for RUNX1 and ETV6-RUNX1 to assess competition between the two proteins for DNA binding
2022-10-31 | E-MTAB-12209 | biostudies-arrayexpress

Genome-wide identification of Drought-responsive Regulatory Coding and Non-coding Transcripts from Oryza sativa L. by deep RNA sequencing
2016-09-06 | GSE74465 | GEO

BT549 cells depleted of ELP3 compared to control
2020-06-20 | E-MTAB-9206 | biostudies-arrayexpress

Histone H3, lysine 27 acetylation ChIP-seq in NALM6 expressing ETV6-RUNX1
2022-10-31 | E-MTAB-12207 | biostudies-arrayexpress

Nitrogen limitation reveals large reserves in metabolic and translational capacities of yeast
2020-03-12 | E-MTAB-8245 | biostudies-arrayexpress

Dual RNA regulator VcdRP in V. cholerae modulates central metabolism
2021-04-14 | ST001752 | MetabolomicsWorkbench

Nuclear Factor-kappa B inhibition in mouse aortic smooth muscle cells stimulated by cholesterol overload or tumor necrosis factor
2023-12-07 | E-MTAB-13416 | biostudies-arrayexpress