Unraveling the Complex Relationship Between mRNA and Protein Abundance: A Machine Learning-Based Approach for Imputing Protein Levels from RNA-seq Data
Ontology highlight
ABSTRACT: The context-dependent correlation between mRNA and protein abundance has long been debated. RNA sequencing (RNA-seq), a high-throughput, commonly used method for analyzing transcriptional dynamics and identifying biomarkers, leaves questions about whether we can translate RNA-seq-identified gene signatures directly to protein changes. In this study, we utilized a set of 17 widely assessed immune and wound healing mediators in the context of canine Volumetric Muscle Loss (VML) to investigate the correlation of mRNA and protein abundance. Our data reveal an overall agreement between mRNA and protein levels on these 17 mediators when examining samples from the same experimental condition, such as the same wound biopsy. However, we observed a lack of correlation between mRNA and protein levels for individual genes under different conditions, underscoring the challenges in converting transcript level changes directly into corresponding protein level changes. As an initial attempt to address this discrepancy, we developed a machine-learning model to predict protein abundances from RNA-seq data, achieving high accuracy (Spearman's Rho: 0.78-0.99, imputed versus measured protein abundance; pooling all biopsies; 5-fold cross-validation). Our approach also effectively corrected multiple extreme outliers measured by antibody-based protein assays. Additionally, this model has the potential to detect post-translational modification events, as shown by accurately estimating activated transforming growth factor (TGF)-β1 levels. While preliminary, this study introduces a promising strategy for translating RNA-seq data into protein abundance and the associated biological relevance.
ORGANISM(S): Canis lupus
PROVIDER: GSE242973 | GEO | 2024/02/26
REPOSITORIES: GEO
ACCESS DATA