Discovery of lincRNA-encoded Peptides: An Integrated Transcriptomics, Proteomics and Bioinformatics Approach
Ontology highlight
ABSTRACT: Long noncoding RNA (lncRNA) refers to the family of RNA transcripts with more than 200 nucleotides in length, but cannot encode proteins. lincRNA (long intergenic noncoding RNA) is a subset of lncRNA that do not overlap with known genes. Increasing evidences have shown that some of these transcripts do in fact contain open reading frames (ORFs) to code short peptides, and do have significant functional roles within the cells. However, many of these peptides remain unannotated and uncharacterized. This study proposes a workflow integrating proteomics, transcriptomics and bioinformatics specifically for lincRNA-encoded peptide discovery. The workflow was tested on the mouse kidney inner medulla (IM), a region that contains the collecting duct system responsible for regulated water transport. In brief, short peptides (from 2 to 20 kDa) were enriched by tricine protein gel and in-gel trypsinized into peptides, then analyzed using high resolution mass spectrometry. However, to match mass fragment ion spectra to peptide sequences requires a reference peptide sequence database which are not available for the noncoding transcripts, and must be generated de novo in the sample of interest. We modified the RNA-Seq mapping workflow by filtering out coding reads first to better quantitate the noncoding transcript expressions. Also, a rule-based ORF prediction was implemented to select one best predicted ORF per noncoding transcript to construct the peptide library. Candidates were further evaluated using several quality control criteria and bioinformatics tools. Three candidates, conserved in rat and human, passed all criteria, maybe truly novel coding genes. In summary, we present a workflow based on the modern transcriptomics and proteomics technologies for lincRNA-encoded peptide discovery. A computational challenge is to generate a hypothetical lincRNA-encoded peptide database for peptide-mass spectra matching. With this workflow, we discovered three previously unannotated peptides in the mouse kidney inner medulla. The same workflow can be applied in any cell or tissue type of interest to quickly advance this research field.
INSTRUMENT(S): Orbitrap Fusion
ORGANISM(S): Mus Musculus (mouse)
TISSUE(S): Epithelial Cell, Kidney
SUBMITTER: CHIN-RANG YANG
LAB HEAD: CHIN-RANG YANG
PROVIDER: PXD013892 | Pride | 2020-05-12
REPOSITORIES: Pride
ACCESS DATA