Project description:Post translational spliced peptides bound to the HLA are unique type of peptides, shown in cancer, for HLA-class-I. Thus far, no consensus has been reached on the extent to which post-translational spliced peptides (PTSPs) occur, stirring significant debate. Furthermore, the role of the HLA-class-II pathway has been studied only in diabetes. Here, we exploit our large-scale cancer peptidomics database and devise a pipeline to filter spliced peptide predictions, to identify recurring spliced peptides, both for HLA-class-I and -II. Our results indicate that HLA-Class-I spliced peptides account for a low percentage (4.4%) of the immunopeptidome, yet are larger in number relative to other types of identified aberrant peptides. Therefore, spliced peptides contribute significantly to the repertoire of presented peptides in cancer cells. In addition, HLA-class-II bound spliced peptides were identified as well, but to a lower extent (0.6%).
Project description:Lysine post-translational modifications (PTMs) play a crucial role in regulating diverse functions and biological processes of proteins. However, because of the large volumes of sequencing data generated from genome-sequencing projects, systematic identification of different types of lysine PTM substrates and PTM sites in the entire proteome remains a major challenge. In recent years, a number of computational methods for lysine PTM identification have been developed. These methods show high diversity in their core algorithms, features extracted and feature selection techniques and evaluation strategies. There is therefore an urgent need to revisit these methods and summarize their methodologies, to improve and further develop computational techniques to identify and characterize lysine PTMs from the large amounts of sequence data. With this goal in mind, we first provide a comprehensive survey on a large collection of 49 state-of-the-art approaches for lysine PTM prediction. We cover a variety of important aspects that are crucial for the development of successful predictors, including operating algorithms, sequence and structural features, feature selection, model performance evaluation and software utility. We further provide our thoughts on potential strategies to improve the model performance. Second, in order to examine the feasibility of using deep learning for lysine PTM prediction, we propose a novel computational framework, termed MUscADEL (Multiple Scalable Accurate Deep Learner for lysine PTMs), using deep, bidirectional, long short-term memory recurrent neural networks for accurate and systematic mapping of eight major types of lysine PTMs in the human and mouse proteomes. Extensive benchmarking tests show that MUscADEL outperforms current methods for lysine PTM characterization, demonstrating the potential and power of deep learning techniques in protein PTM prediction. The web server of MUscADEL, together with all the data sets assembled in this study, is freely available at http://muscadel.erc.monash.edu/. We anticipate this comprehensive review and the application of deep learning will provide practical guide and useful insights into PTM prediction and inspire future bioinformatics studies in the related fields.
Project description:The diversity of peptides displayed by class I HLA plays an essential role in T cell immunity. The peptide repertoire is extended by various post-translational modifications, including cis-splicing of peptides from the same protein. Here, we have applied a novel bioinformatic workflow and demonstrate that spliced-peptides are also generated through trans-splicing (fusion of peptide segments from distinct antigens) and their abundance challenges current models of proteasomal-splicing that predict cis-splicing as the most probable outcome. These trans-spliced peptides display canonical HLA binding motif sequence features. These results highlight the unanticipated diversity of the immunopeptidome and have important implications for autoimmunity, vaccine design and immunotherapy.
Project description:The diversity of peptides displayed by class I HLA plays an essential role in T cell immunity. The peptide repertoire is extended by various post-translational modifications, including cis-splicing of peptides from the same protein. Here, we have applied a novel bioinformatic workflow and demonstrate that spliced-peptides are also generated through trans-splicing (fusion of peptide segments from distinct antigens) and their abundance challenges current models of proteasomal-splicing that predict cis-splicing as the most probable outcome. These trans-spliced peptides display canonical HLA binding motif sequence features. These results highlight the unanticipated diversity of the immunopeptidome and have important implications for autoimmunity, vaccine design and immunotherapy.
Project description:BackgroundPost-translational modification (PTM) of proteins is central to many cellular processes across all domains of life, but despite decades of study and a wealth of genomic and proteomic data the biological function of many PTMs remains unknown. This is especially true for prokaryotic PTM systems, many of which have only recently been recognized and studied in depth. It is increasingly apparent that a deep sampling of abundance across a wide range of environmental stresses, growth conditions, and PTM types, rather than simply cataloging targets for a handful of modifications, is critical to understanding the complex pathways that govern PTM deposition and downstream effects.ResultsWe utilized a deeply-sampled dataset of MS/MS proteomic analysis covering 9 timepoints spanning the Escherichia coli growth cycle and an unbiased PTM search strategy to construct a temporal map of abundance for all PTMs within a 400 Da window of mass shifts. Using this map, we are able to identify novel targets and temporal patterns for N-terminal N α acetylation, C-terminal glutamylation, and asparagine deamidation. Furthermore, we identify a possible relationship between N-terminal N α acetylation and regulation of protein degradation in stationary phase, pointing to a previously unrecognized biological function for this poorly-understood PTM.ConclusionsUnbiased detection of PTM in MS/MS proteomics data facilitates the discovery of novel modification types and previously unobserved dynamic changes in modification across growth timepoints.
Project description:Peptides displayed by MHC molecules on a cell’s surface, referred to as its immunopeptidome, play an important role in the adaptive the immune response. Antigen processing for MHC class I presentation is a ubiquitous pathway present in all nucleated cells which generate and present peptides of both self and non-self origin. Peptides with post-translational modifications (PTMs) are one of the classes of peptides presented by MHC class I molecules. However, due to the high background of self-peptides presented by the cells, the diversity of peptides with post-translational modifications is not well reported. In this study, we have carried out MHC Class I immunopeptidomics analysis on Jurkat and A375 cell lines to characterize the diversity of post-translational modifications among MHC class I peptides. Using high resolution mass spectrometry, we identified 25,761 MHC-bound peptides across both the cell lines using Bolt and Sequest search engines. High specificity of the enrichment method is demonstrated by identifying ~90% of the peptides with typical length distribution of 8-12 aa and enriched motifs within those peptides similar to the binding motifs of MHC alleles. Among the MHC-bound peptides, we identified phosphorylation as a major post-translational modification followed by deamidation. We observed site-specific localization of these post-translational modifications, at position P4 for phosphorylated peptides and position P3 for deamidated peptides. We identified a smaller number of peptides with acetylated and methylated lysine, possibly due to very low stoichiometric levels of these post-translational modifications compared to phosphorylation and deamidation. Using PEAKS de novo sequencing algorithm, we identified spliced peptides that account for ~5-7% of MHC-bound peptides across the two cell lines. These peptides share similar features with respect to normal MHC-bound peptides such as peptide length distribution and binding motifs. We validated the identification of several post-translationally modified peptides and spliced peptides using synthetic peptide sequences. In conclusion, our study demonstrates unbiased identification of these low stoichiometric PTMs and unusual spliced peptides using high resolution mass spectrometry.
Project description:MotivationProtein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation.MethodWe use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models-dubbed PPI-BioBERT-x10-to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions.Results and conclusionThe PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [Formula: see text] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.
Project description:Heat stress is one of the most prominent and deleterious environmental threads affecting plant growth and development. Upon high temperatures, plants launch specialized gene expression programs that promote stress protection and survival. These programs involve global and specific changes at the transcriptional and translational levels. However the coordination of these processes and their specific role in the establishment of the heat stress response is not fully elucidated. In this report, we have carried out a genome-wide analysis to simultaneously monitor the individual changes in the transcriptional and translational mRNA levels of Arabidopsis thaliana seedlings after the exposure to a heat shock stress. Our results demonstrated that, superimposed to transcription, translation exerts a wide but dual regulation of gene expression. For the majority of mRNAs, translation is severely repressed, causing a decreased of 50% of the association of the bulk of mRNAs to polysomes. However, some relevant mRNAs involved in different aspects of homeostasis maintenance follow a differential pattern of translation. Analysis of the sequence of the differentially translated mRNAs unravels that some features, like the 5M-BM-4UTR G+C content and the cDNA length, may take part in the discrimination mechanisms for mRNA polysome loading. Among the differential translated genes stand out master regulators of the stress response, highlighting the main role of translation in the early establishment of physiological response of plants to elevated temperatures. In total 8 ATH1 Affymetrix GeneChips were hybridized with all combinations of two factors: total mRNA/polysome-bound-RNA; 22M-BM-:C/38M-BM-:C. Two biological replicates per sample type were performed.
Project description:BackgroundAgent based models (ABM) are useful to explore population-level scenarios of disease spread and containment, but typically characterize infected individuals using simplified models of infection and symptoms dynamics. Adding more realistic models of individual infections and symptoms may help to create more realistic population level epidemic dynamics.MethodsUsing an equation-based, host-level mathematical model of influenza A virus infection, we develop a function that expresses the dependence of infectivity and symptoms of an infected individual on initial viral load, age, and viral strain phenotype. We incorporate this response function in a population-scale agent-based model of influenza A epidemic to create a hybrid multiscale modeling framework that reflects both population dynamics and individualized host response to infection.ResultsAt the host level, we estimate parameter ranges using experimental data of H1N1 viral titers and symptoms measured in humans. By linearization of symptoms responses of the host-level model we obtain a map of the parameters of the model that characterizes clinical phenotypes of influenza infection and immune response variability over the population. At the population-level model, we analyze the effect of individualizing viral response in agent-based model by simulating epidemics across Allegheny County, Pennsylvania under both age-specific and age-independent severity assumptions.ConclusionsWe present a framework for multi-scale simulations of influenza epidemics that enables the study of population-level effects of individual differences in infections and symptoms, with minimal additional computational cost compared to the existing population-level simulations.
Project description:In this study, we show that pediatric T-cell acute lymphoblastic leukemia (T-ALL) has an alternative mechanism for aberrant splicing that involves post-translational regulation of the splicing machinery via deubiquitination.