Unknown

Dataset Information

0

Accurate annotation of human protein-coding small open reading frames.


ABSTRACT: Functional protein-coding small open reading frames (smORFs) are emerging as an important class of genes. However, the number of translated smORFs in the human genome is unclear because proteogenomic methods are not sensitive enough, and, as we show, Ribo-seq strategies require additional measures to ensure comprehensive and accurate smORF annotation. Here, we integrate de novo transcriptome assembly and Ribo-seq into an improved workflow that overcomes obstacles with previous methods, to more confidently annotate thousands of smORFs. Evolutionary conservation analyses suggest that hundreds of smORF-encoded microproteins are likely functional. Additionally, many smORFs are regulated during fundamental biological processes, such as cell stress. Peptides derived from smORFs are also detectable on human leukocyte antigen complexes, revealing smORFs as a source of antigens. Thus, by including additional validation into our smORF annotation workflow, we accurately identify thousands of unannotated translated smORFs that will provide a rich pool of unexplored, functional human genes.

SUBMITTER: Martinez TF 

PROVIDER: S-EPMC7085969 | biostudies-literature | 2020 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Accurate annotation of human protein-coding small open reading frames.

Martinez Thomas F TF   Chu Qian Q   Donaldson Cynthia C   Tan Dan D   Shokhirev Maxim N MN   Saghatelian Alan A  

Nature chemical biology 20191209 4


Functional protein-coding small open reading frames (smORFs) are emerging as an important class of genes. However, the number of translated smORFs in the human genome is unclear because proteogenomic methods are not sensitive enough, and, as we show, Ribo-seq strategies require additional measures to ensure comprehensive and accurate smORF annotation. Here, we integrate de novo transcriptome assembly and Ribo-seq into an improved workflow that overcomes obstacles with previous methods, to more c  ...[more]

Similar Datasets

2019-07-03 | GSE125218 | GEO
| PRJNA515538 | ENA
| S-EPMC9757701 | biostudies-literature
| S-EPMC7856248 | biostudies-literature
| S-EPMC7394265 | biostudies-literature
2021-04-28 | GSE154491 | GEO
| S-EPMC9375913 | biostudies-literature
| S-EPMC5082802 | biostudies-literature
| S-EPMC3223728 | biostudies-literature
2014-09-11 | E-GEOD-60384 | biostudies-arrayexpress