Unknown

Dataset Information

0

IDRMutPred: predicting disease-associated germline nonsynonymous single nucleotide variants (nsSNVs) in intrinsically disordered regions.


ABSTRACT:

Motivation

Despite of the lack of folded structure, intrinsically disordered regions (IDRs) of proteins play versatile roles in various biological processes, and many nonsynonymous single nucleotide variants (nsSNVs) in IDRs are associated with human diseases. The continuous accumulation of nsSNVs resulted from the wide application of NGS has driven the development of disease-association prediction methods for decades. However, their performance on nsSNVs in IDRs remains inferior, possibly due to the domination of nsSNVs from structured regions in training data. Therefore, it is highly demanding to build a disease-association predictor specifically for nsSNVs in IDRs with better performance.

Results

We present IDRMutPred, a machine learning-based tool specifically for predicting disease-associated germline nsSNVs in IDRs. Based on 17 selected optimal features that are extracted from sequence alignments, protein annotations, hydrophobicity indices and disorder scores, IDRMutPred was trained using three ensemble learning algorithms on the training dataset containing only IDR nsSNVs. The evaluation on the two testing datasets shows that all the three prediction models outperform 17 other popular general predictors significantly, achieving the ACC between 0.856 and 0.868 and MCC between 0.713 and 0.737. IDRMutPred will prioritize disease-associated IDR germline nsSNVs more reliably than general predictors.

Availability and implementation

The software is freely available at http://www.wdspdb.com/IDRMutPred.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Zhou JB 

PROVIDER: S-EPMC7755418 | biostudies-literature | 2020 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

IDRMutPred: predicting disease-associated germline nonsynonymous single nucleotide variants (nsSNVs) in intrinsically disordered regions.

Zhou Jing-Bo JB   Xiong Yao Y   An Ke K   Ye Zhi-Qiang ZQ   Wu Yun-Dong YD  

Bioinformatics (Oxford, England) 20201201 20


<h4>Motivation</h4>Despite of the lack of folded structure, intrinsically disordered regions (IDRs) of proteins play versatile roles in various biological processes, and many nonsynonymous single nucleotide variants (nsSNVs) in IDRs are associated with human diseases. The continuous accumulation of nsSNVs resulted from the wide application of NGS has driven the development of disease-association prediction methods for decades. However, their performance on nsSNVs in IDRs remains inferior, possib  ...[more]

Similar Datasets

| S-EPMC9250585 | biostudies-literature
| S-EPMC6174354 | biostudies-literature
| S-EPMC3949125 | biostudies-literature
| S-EPMC4474717 | biostudies-literature
| S-EPMC4061446 | biostudies-literature
| S-EPMC6954741 | biostudies-literature
| S-EPMC5314938 | biostudies-literature
| S-EPMC9649497 | biostudies-literature
| S-EPMC3355724 | biostudies-literature
| S-EPMC6699704 | biostudies-literature