Project description:Proteogenomics methods have identified many non-annotated protein-coding genes in the human genome. Many of the newly discovered protein-coding genes encode peptides and small proteins, referred to collectively as microproteins. Microproteins are produced through ribosome translation of small open reading frames (smORFs). The discovery of many smORFs reveals a blind spot in traditional gene-finding algorithms for these genes. Biological studies have found roles for microproteins in cell biology and physiology, and the potential that there exists additional bioactive microproteins drives the interest in detection and discovery of these molecules. A key step in any proteogenomics workflow is the assembly of RNA-Seq data into likely mRNA transcrips that are then used to create a searchable protein databases. Here we demonstrate that specific features of the assembled transcriptome impact microprotein detection by shotgun proteomics. By tailoring transcript assembly for downstream mass spectrometry searching, we show that we can detect more than double the number of high-quality microprotein candidates and introduce a novel open-source mRNA assembler for proteogenomics (MAPS) that incorporates all of these features. By integrating our specialized assembler, MAPS, and a popular generalized assembler into our proteogenomics pipeline, we detect 45 novel human microproteins from a high quality proteogenomics dataset of a human cell line. We then characterize the features of the novel microproteins, identifying two classes of microproteins. Our work highlights the importance of specialized transcriptome assembly upstream of proteomics validation when searching for short and potentially rare and poorly conserved proteins.
Project description:Lung cancer remains the leading cause of cancer-related mortality worldwide, with limited treatment options for advanced stages. This proteogenomics study aims to integrate multi-omics approaches, including proteomics, genomics, and transcriptomics, to elucidate the molecular mechanisms underlying lung cancer progression and treatment resistance. By leveraging cutting-edge technologies, this study seeks to identify novel biomarkers and therapeutic targets, enabling personalized medicine strategies to improve patient outcomes. The integration of proteogenomic data will provide a comprehensive understanding of tumor biology, revealing critical pathways and interactions that drive tumorigenesis and immune evasion.
Project description:We have develop a proteogenomics-based approach for identification of human MHC class I-associated peptides, including those deriving from polymorphisms, mutations and non-canonical reading frames
Project description:Pilocytic astrocytoma (PA) is the most common pediatric brain tumor and driven by aberrant MAPK signaling, typically mediated by BRAF alterations. While five-year overall survival rates exceed 95%, tumor recurrence constitutes a major clinical challenge in incompletely resected tumors despite chemotherapeutic or radiation based therapies. Therefore, we used proteogenomics to discern the biological heterogeneity of PA to improve classification of this tumor entity and identify novel therapeutic targets. Our proteogenomics approach integrates RNA sequencing and LC/MS-based proteomic profiling data from a cohort of 58 confirmed, primary PA samples. An integrative genomics approach was conducted to discern the biological heterogeneity of PA and to identify aberrant pathway activation in these biological subgroups. In summary, pilocytic astrocytomas segregate into two groups where younger patients are significantly associated with Group 1. Importantly, we validate the two distinct biological subgroups in two non-overlapping cohorts. The biological heterogeneity seen here may improve biological classification and reveal novel therapeutic targets specifically useful for non-resectable tumors with high risk of recurrent or progressive disease.
Project description:Conventional prokaryotic RNA labeling method usually requires large amounts of starting materials and tends to generate high background signals. Recently, two novel methods based on amplification systems were introduced. Here, we compared three alternative strategies: direct labeling method, ployadenylation-involved oligo-dT priming amplification method and random priming amplification method (hereafter referred to as DL, PAOD and RPA method in this article) for prokaryotic RNA labeling employing the expression profiling investigation in Escherichia coli (E. coli) heat shock model.
Project description:We present Prokaryotic Expression-profiling by Tagging RNA In Situ and sequencing (PETRI-seq), a high-throughput prokaryotic scRNA-seq pipeline. We demonstrated that PETRI-seq effectively barcoded single bacterial cells in a species-mixing experiment with E. coli (MG1655) and S. aureus (USA300). Within the S. aureus population, we found rare prophage induction in 0.04% of cells. We further demonstrated that PETRI-seq was able to distinguish between E. coli growth phases based on mRNA expression patterns by combining stationary E. coli with exponential E. coli in multiple experiments.