Dataset Information

LYRUS: a machine learning model for predicting the pathogenicity of missense variants.

ABSTRACT:

Summary

Single amino acid variations (SAVs) are a primary contributor to variations in the human genome. Identifying pathogenic SAVs can provide insights to the genetic architecture of complex diseases. Most approaches for predicting the functional effects or pathogenicity of SAVs rely on either sequence or structural information. This study presents 〈Lai Yang Rubenstein Uzun Sarkar〉 (LYRUS), a machine learning method that uses an XGBoost classifier to predict the pathogenicity of SAVs. LYRUS incorporates five sequence-based, six structure-based and four dynamics-based features. Uniquely, LYRUS includes a newly proposed sequence co-evolution feature called the variation number. LYRUS was trained using a dataset that contains 4363 protein structures corresponding to 22 639 SAVs from the ClinVar database, and tested using the VariBench testing dataset. Performance analysis showed that LYRUS achieved comparable performance to current variant effect predictors. LYRUS's performance was also benchmarked against six Deep Mutational Scanning datasets for PTEN and TP53.

Availability and implementation

LYRUS is freely available and the source code can be found at https://github.com/jiaying2508/LYRUS.

Supplementary information

Supplementary data are available at Bioinformatics Advances online.

SUBMITTER: Lai J

PROVIDER: S-EPMC8754197 | biostudies-literature | 2022

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

LYRUS: a machine learning model for predicting the pathogenicity of missense variants.

Lai Jiaying J Yang Jordan J Gamsiz Uzun Ece D ED Rubenstein Brenda M BM Sarkar Indra Neil IN

Bioinformatics advances 20211225 1

<h4>Summary</h4>Single amino acid variations (SAVs) are a primary contributor to variations in the human genome. Identifying pathogenic SAVs can provide insights to the genetic architecture of complex diseases. Most approaches for predicting the functional effects or pathogenicity of SAVs rely on either sequence or structural information. This study presents 〈Lai Yang Rubenstein Uzun Sarkar〉 (LYRUS), a machine learning method that uses an XGBoost classifier to predict the pathogenicity of SAVs. ...[more]

PMID: 35036922

Dataset Information

LYRUS: a machine learning model for predicting the pathogenicity of missense variants.

Summary

Availability and implementation

Supplementary information

Publications

LYRUS: a machine learning model for predicting the pathogenicity of missense variants.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

MLb-LDLr: A Machine Learning Model for Predicting the Pathogenicity of <i>LDLr</i> Missense Variants.
| S-EPMC8617597 | biostudies-literature

Rhapsody: predicting the pathogenicity of human missense variants.
| S-EPMC7214033 | biostudies-literature

Gene-specific machine learning for pathogenicity prediction of rare BRCA1 and BRCA2 missense variants.
| S-EPMC10307865 | biostudies-literature

Predicting pathogenicity of missense variants with weakly supervised regression.
| S-EPMC6744350 | biostudies-literature

Predicting the pathogenicity of missense variants using features derived from AlphaFold2.
| S-EPMC10203375 | biostudies-literature

MVP predicts the pathogenicity of missense variants by deep learning.
| S-EPMC7820281 | biostudies-literature

Linking Protein Stability to Pathogenicity: Predicting Clinical Significance of Single-Missense Mutations in Ocular Proteins Using Machine Learning.
| S-EPMC11546782 | biostudies-literature

REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants.
| S-EPMC5065685 | biostudies-literature

Gene-specific machine learning model to predict the pathogenicity of <i>BRCA2</i> variants.
| S-EPMC9561395 | biostudies-literature

APOGEE 2: multi-layer machine-learning model for the interpretable prediction of mitochondrial missense variants.
| S-EPMC10439926 | biostudies-literature