Project description:Alternative splicing (AS) isoforms create numerous proteoforms, expanding the complexity of the genome. Highly similar sequences, incomplete reference databases and the insufficient sequence coverage of mass spectrometry limit the identification of AS proteoforms.In this work, we compared RNC-seq and Ribo-seq in the context of proteome identification, especially when identifying protein isoforms from AS. We also demonstrated that the single-molecule long read sequencing technique identified thousands of new splice variants and guided the MS identifications of new protein isoforms.
Project description:Alternative splicing (AS) isoforms create numerous proteoforms, expanding the complexity of the genome. Highly similar sequences, incomplete reference databases and the insufficient sequence coverage of mass spectrometry limit the identification of AS proteoforms. Here, we demonstrated full-length translating mRNAs (ribosome nascent-chain complex-bound mRNAs, RNC-mRNAs) sequencing (RNC-seq) strategy to sequence the entire translating mRNA using next-generation sequencing, including short-read and long-read technologies, to construct a protein database containing all translating AS isoforms. Taking the advantage of read length, short-read RNC-seq identified up to 15,289 genes and 15,906 AS isoforms in a single human cell line, much more than the Ribo-seq. The single-molecule long-read RNC-seq supplemented 4,429 annotated AS isoforms that were not identified by short-read datasets, and 4,525 novel AS isoforms that were not included in the public databases. Using such RNC-seq-guided database, we identified 6,766 annotated protein isoforms and 50 novel protein isoforms in mass spectrometry datasets. These results demonstrated the potential of full-length RNC-seq in investigating the proteome of AS isoforms.
Project description:Oocyte-to-embryo transition plays a critical role in oocyte maturation and embryogenesis. It is a highly regulated process in part due to a transcription silenced period followed by zygotic genome activation. How transcriptome, translatome, and proteome interplay in this critical developmental window remains poorly understood. Utilizing a highly sensitive mass spectrometry, we obtained a high-quality proteome landscape spanning 10 stages, from the mouse full-grown oocyte (FGO) to blastocyst, using 100 oocytes/embryos at each stage. By integrative analysis with corresponding transcriptome and translatome, we found transcription and translation levels can not reflect protein abundance in most cases. From FGO to 4-cell embryos, proteomes are predominated by FGO-produced proteins, while the transcriptome and translatome are much more dynamic. FGO inherited proteins frequently persist after the corresponding transcripts are already downregulated or decayed. Improved concordance between protein and RNA is observed for genes starting translation only upon meiotic resumption (OET upregulated) or transcribed only in embryos, although the detected protein dynamics often lag behind transcription and translation. Concordance between protein and transcription/tranlation is associated with protein half-lives. Finally, a kinetic model well predicts protein dynamics when incorporating both the initial protein abundance in FGO and translation kinetics across developmental stages. In sum, our study reveals multilayer control of gene expression during oocyte maturation and embryogenesis.
Project description:Alternative splicing of pre-mRNA generates protein diversity and has been linked to cancer progression and drug response. Exon microarray technology enables genome-wide quantication of expression levels for the majority of exons and facilitates the discovery of alternative splicing events. Analysis of exon array data is more challenging than gene expression data and there is a need for reliable quantication of exons and alternative spliced variants. We introduce a novel, computationally efficient methodology, MEAP, for exon array data preprocessing, analysis and visualization. We compared MEAP with other preprocessing methods, and validation of the results show that MEAP produces reliable quantication of exons and alternative spliced variants. Analysis of data from head and neck squamous cell carcinoma (HNSCC) cell lines revealed several variants associated with 11q13 amplication, which is a predictive marker of metastasis and decreased survival in HNSCC patients. Together these results demonstrate the utility of MEAP in suggesting novel experimentally testable predictions. Thus, in addition to novel methodology to process large-scale exon array data sets, our results provide several HNSCC candidate genes for further studies.
Project description:Alternative splicing of pre-mRNA generates protein diversity and has been linked to cancer progression and drug response. Exon microarray technology enables genome-wide quantication of expression levels for the majority of exons and facilitates the discovery of alternative splicing events. Analysis of exon array data is more challenging than gene expression data and there is a need for reliable quantication of exons and alternative spliced variants. We introduce a novel, computationally efficient methodology, MEAP, for exon array data preprocessing, analysis and visualization. We compared MEAP with other preprocessing methods, and validation of the results show that MEAP produces reliable quantication of exons and alternative spliced variants. Analysis of data from head and neck squamous cell carcinoma (HNSCC) cell lines revealed several variants associated with 11q13 amplication, which is a predictive marker of metastasis and decreased survival in HNSCC patients. Together these results demonstrate the utility of MEAP in suggesting novel experimentally testable predictions. Thus, in addition to novel methodology to process large-scale exon array data sets, our results provide several HNSCC candidate genes for further studies. We analyzed 15 samples using the Affymetrix Human Exon 1.0 ST platform, of which 7 samples have 11q13 amplification. Array data was preprocessed by using Multiple Exon Array Processing (MEAP).
Project description:Thousands of human genes contain introns ending in NAGNAG motifs (N any nucleotide), where both NAGs can function as 3' splice sites, yielding isoforms differing by inclusion/exclusion of just three bases. However, the functional importance of NAGNAG alternative splicing is highly controversial. Using very deep RNA-Seq data from sixteen human and eight mouse tissues, we found that approximately half of alternatively spliced NAGNAGs undergo tissue-specific regulation and that regulated events have been selectively retained: alternative splicing of strongly tissue-specific NAGNAGs was ten times as likely to be conserved between species as for non-tissue-specific events. Further, alternative splicing of human NAGNAGs was associated with an order of magnitude increase in the frequency of exon length changes at orthologous mouse/rat exon boundaries, suggesting that NAGNAGs accelerate exon evolution. Together, our analyses show that NAGNAG alternative splicing constitutes a major generator of tissue-specific proteome diversity and accelerates evolution of proteins at exon-exon boundaries. mRNA-Seq of sixteen human and eight mouse tissues. Supplementary files: human.nagnag.junctions.gff and mouse.nagnag.junctions.gff are the annotation files (in GFF3 format) corresponding to the 'bwtout' mapped reads files linked to the Sample records. Raw data files provided for Samples GSM742937-GSM742952 only.
Project description:Thousands of human genes contain introns ending in NAGNAG motifs (N any nucleotide), where both NAGs can function as 3' splice sites, yielding isoforms differing by inclusion/exclusion of just three bases. However, the functional importance of NAGNAG alternative splicing is highly controversial. Using very deep RNA-Seq data from sixteen human and eight mouse tissues, we found that approximately half of alternatively spliced NAGNAGs undergo tissue-specific regulation and that regulated events have been selectively retained: alternative splicing of strongly tissue-specific NAGNAGs was ten times as likely to be conserved between species as for non-tissue-specific events. Further, alternative splicing of human NAGNAGs was associated with an order of magnitude increase in the frequency of exon length changes at orthologous mouse/rat exon boundaries, suggesting that NAGNAGs accelerate exon evolution. Together, our analyses show that NAGNAG alternative splicing constitutes a major generator of tissue-specific proteome diversity and accelerates evolution of proteins at exon-exon boundaries.