Unknown,Transcriptomics,Genomics,Proteomics

Dataset Information

K562 polyA RNA-Seq

ABSTRACT: RNA-Seq reads and TopHat (Trapnell et al. Bioinformatics 2009) alignments of K562 cell-line transcriptome. These were used to validate the expression of short peptides idenitified by Mass-Spectrometry in K562 cells. K562 polyA+ RNA (Batch 1) and total RNA (batch 2) was purchased from Ambion. We used oligo (dT)-selected polyA+ RNA to construct libraries for RNA-Seq.We then profiled the transcriptome of polyadenylated mRNA-Seq using Illumina sequencing platforms. We then used the sequenced reads to reconstruct the transcriptome using the Cufflinks de-novo assembler (Trapnell et al. Nat.Bio.Tech. 2010). Recent computational and ribosome profiling analyses suggest that many short open reading frames (sORFs) in eukaryotic genomes are translated. However, evidence that these sORFs produce stable polypeptides is lacking. Here we develop a strategy to discover and validate novel sORF-encoded polypeptides (SEPs) in human cells. In total, we detect 117 SEPs, 114 of which are novel, varying in length from 15 to 149 amino acids. Of these, 10 SEPs (0.5%) are derived from long intergenic non-coding RNAs (lincRNAs). We also observe the presence of polycistronic genes and the widespread use of non-AUG start codons, which is a phenomenon historically thought to be rare in the mammalian genome. Quantitative measurements reveal that SEPs can be found at concentrations between ~10-2000 copies per cell, which is within the range of typical cellular proteins. We confirm the translation of a number of these SEPs through heterologous expression of their encoding cDNAs. We also discover that several SEPs possess properties characteristic of functional proteins. These results demonstrate that human sORFs produce numerous stable polypeptides, revealing that the human proteome is larger and more diverse than previously appreciated.

ORGANISM(S): Homo sapiens

SUBMITTER: Nataly Cabili

PROVIDER: E-GEOD-34740 | biostudies-arrayexpress |

REPOSITORIES: biostudies-arrayexpress

ACCESS DATA

Publications

Peptidomic discovery of short open reading frame-encoded peptides in human cells.

Slavoff Sarah A SA Mitchell Andrew J AJ Schwaid Adam G AG Cabili Moran N MN Ma Jiao J Levin Joshua Z JZ Karger Amir D AD Budnik Bogdan A BA Rinn John L JL Saghatelian Alan A

Nature chemical biology 20121118 1

The complete extent to which the human genome is translated into polypeptides is of fundamental importance. We report a peptidomic strategy to detect short open reading frame (sORF)-encoded polypeptides (SEPs) in human cells. We identify 90 SEPs, 86 of which are previously uncharacterized, which is the largest number of human SEPs ever reported. SEP abundances range from 10-1,000 molecules per cell, identical to abundances of known proteins. SEPs arise from sORFs in noncoding RNAs as well as mul ...[more]

PMID: 23160002

Dataset Information

K562 polyA RNA-Seq

Publications

Peptidomic discovery of short open reading frame-encoded peptides in human cells.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

PolyA Sequencing of K562 Cells
| PRJNA607080 | ENA

Small Protein Enrichment-Based Proteogenomics Identifies Plentiful Missing Proteins and Three Novel sORFs in Saccharomyces cerevisiae
2018-06-15 | PXD008586 | Pride

DDA-PASEF and diaPASEF acquired A549/K562 proteomic datasets with deliberate batch effects
2023-11-21 | PXD041421 | Pride

RNA-seq from ENCODE/Caltech (Mouse)
2012-05-10 | GSE37909 | GEO

Expression analysis of Shigella flexneri 2a strain 301
2011-06-27 | E-GEOD-22800 | biostudies-arrayexpress

The small proteome of the nitrogen-fixing plant symbiont Sinorhizobium meliloti
2023-02-14 | GSE206492 | GEO