Project description:The paper "Metabolomic Machine Learning Predictor for Diagnosis and Prognosis of Gastric Cancer" addresses the need for non-invasive diagnostic tools for gastric cancer (GC). Traditional methods like endoscopy are invasive and expensive. The authors conducted a targeted metabolomics analysis of 702 plasma samples to develop machine learning models for GC diagnosis and prognosis. The diagnostic model, using 10 metabolites, achieved a sensitivity of 0.905, outperforming conventional protein marker-based methods. The prognostic model effectively stratified patients into risk groups, surpassing traditional clinical models.
I have successfully reproduced the diagnosis model from the paper. This machine learning-based system differentiates GC patients from non-GC controls using metabolomics data from plasma samples analyzed by liquid chromatography-mass spectrometry (LC-MS). The model focuses on 10 metabolites, including succinate, uridine, lactate, and serotonin. Employing LASSO regression and a random forest classifier, the model achieved an AUROC of 0.967, with a sensitivity of 0.854 and specificity of 0.926. This model significantly outperforms traditional diagnostic methods and underscores the potential of integrating machine learning with metabolomics for early GC detection and treatment.
Project description:The small proteome has already been well explored in eukaryal and bacterial species, but so far, archaeal genomes have not yet been analysed broadly with a dedicated focus on small proteins. Here, we present a combinatorial approach, integrating experimental information from small protein-optimized mass spectrometry (MS) and ribosome profiling (Ribo-seq) to generate a high confidence inventory of small proteins in the model archaeon Haloferax volcanii. Translation was demonstrated for 67% of the annotated small coding sequences by both methods. Annotation-independent data analysis allowed for the prediction of 47 sites of ribosomal engagement outside known coding regions by Ribo-seq, seven of whom correspond to the eight un-annotated small proteins identified by a similar independent analysis of proteomic data. We also present independent evidence in vivo for the translation of a subset of small proteins (comprising both previously annotated and newly identified), underlining the validity of our identification scheme. Moreover, several of these translated sORFs are conserved in Haloferax and might have important functions. Based on our findings, we conclude that the small proteome of H. volcanii is larger than previously expected and that the combined use of mass spectrometry to detect protein presence with Ribo-seq to inform on translation is a powerful tool for the discovery of new small protein-coding genes in diverse organisms. This data-set contains the search results obtained from an MS-Fragger search against six-frame genome translation-derived database that were mapped to the genome by Stephan Fuchs “Salt & Pepper” software suite for bacterial proteogenomics.
Project description:Bottom-up proteomics database search algorithms used for peptide identification cannot comprehensively identify posttranslational modifications (PTMs) in a single-pass because of high false discovery rates (FDRs). A new approach to database searching enables Global PTM (G-PTM) identification by exclusively looking for curated PTMs, thereby avoiding the FDR penalty experienced during conventional variable modification searches. We identified nearly 2500 unique, high-confidence modified peptides comprising 31 different PTM types in single-pass database searches.