Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling.
Ontology highlight
ABSTRACT: Accurate annotations of protein coding regions are essential for understanding how genetic information is translated into biological functions. The recent development of ribosome footprint profiling provides an important new tool for measuring translation. Here we describe riboHMM, a new method that uses ribosome footprint data along with gene expression and sequence information to accurately infer translated sequences. We applied our method to human lymphoblastoid cell lines and identified 7,863 previously unannotated coding sequences, including 445 translated sequences in pseudogenes and 2,442 translated upstream open reading frames. We observed an enrichment of harringtonine-treated ribosome footprints at the inferred initiation sites, validating many of the novel coding sequences. In aggregate, the novel sequences exhibit significant signatures of purifying selection indicative of protein-coding function, suggesting that many of the novel sequences are functional. We observed that nearly 40% of bicistronic transcripts showed significant negative correlation in the levels of translation of their two coding sequences, suggesting a key regulatory role for these novel translated sequences. Despite evidence for their functional importance, the novel peptide sequences were detected by mass spectrometry at a lower rate than predicted based on data from annotated proteins, thus suggesting that many of the novel peptide products may be relatively short-lived. Our work illustrates the value of ribosome profiling for improving coding annotations, and significantly expands the set of known coding regions.
ORGANISM(S): Homo sapiens
PROVIDER: GSE75290 | GEO | 2015/12/25
SECONDARY ACCESSION(S): PRJNA303958
REPOSITORIES: GEO
ACCESS DATA