Unknown

Dataset Information

0

Family-specific analysis of variant pathogenicity prediction tools.


ABSTRACT: Using the presently available datasets of annotated missense variants, we ran a protein family-specific benchmarking of tools for predicting the pathogenicity of single amino acid variants. We find that despite the high overall accuracy of all tested methods, each tool has its Achilles heel, i.e. protein families in which its predictions prove unreliable (expected accuracy does not exceed 51% in any method). As a proof of principle, we show that choosing the optimal tool and pathogenicity threshold at a protein family-individual level allows obtaining reliable predictions in all Pfam domains (accuracy no less than 68%). A functional analysis of the sets of protein domains annotated exclusively by neutral or pathogenic mutations indicates that specific protein functions can be associated with a high or low sensitivity to mutations, respectively. The highly sensitive sets of protein domains are involved in the regulation of transcription and DNA sequence-specific transcription factor binding, while the domains that do not result in disease when mutated are responsible for mediating immune and stress responses. These results suggest that future predictors of pathogenicity and especially variant prioritization tools may benefit from considering functional annotation.

SUBMITTER: Zaucha J 

PROVIDER: S-EPMC7671395 | biostudies-literature | 2020 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Family-specific analysis of variant pathogenicity prediction tools.

Zaucha Jan J   Heinzinger Michael M   Tarnovskaya Svetlana S   Rost Burkhard B   Frishman Dmitrij D  

NAR genomics and bioinformatics 20200228 2


Using the presently available datasets of annotated missense variants, we ran a protein family-specific benchmarking of tools for predicting the pathogenicity of single amino acid variants. We find that despite the high overall accuracy of all tested methods, each tool has its Achilles heel, i.e. protein families in which its predictions prove unreliable (expected accuracy does not exceed 51% in any method). As a proof of principle, we show that choosing the optimal tool and pathogenicity thresh  ...[more]

Similar Datasets

| S-EPMC7790749 | biostudies-literature
| S-EPMC6633260 | biostudies-literature
| S-EPMC5561458 | biostudies-literature
| S-EPMC7873245 | biostudies-literature
| S-EPMC9803319 | biostudies-literature
| S-ECPF-GEOD-29137 | biostudies-other
| S-EPMC8119284 | biostudies-literature
| S-EPMC6315130 | biostudies-literature
| S-EPMC4913391 | biostudies-literature
| S-EPMC4308924 | biostudies-literature