Unknown

Dataset Information

0

Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data.


ABSTRACT: Large datasets describing the quantitative effects of mutations on protein function are becoming increasingly available. Here, we leverage these datasets to develop Envision, which predicts the magnitude of a missense variant's molecular effect. Envision combines 21,026 variant effect measurements from nine large-scale experimental mutagenesis datasets, a hitherto untapped training resource, with a supervised, stochastic gradient boosting learning algorithm. Envision outperforms other missense variant effect predictors both on large-scale mutagenesis data and on an independent test dataset comprising 2,312 TP53 variants whose effects were measured using a low-throughput approach. This dataset was never used for hyperparameter tuning or model training and thus serves as an independent validation set. Envision prediction accuracy is also more consistent across amino acids than other predictors. Finally, we demonstrate that Envision's performance improves as more large-scale mutagenesis data are incorporated. We precompute Envision predictions for every possible single amino acid variant in human, mouse, frog, zebrafish, fruit fly, worm, and yeast proteomes (https://envision.gs.washington.edu/).

SUBMITTER: Gray VE 

PROVIDER: S-EPMC5799033 | biostudies-literature | 2018 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data.

Gray Vanessa E VE   Hause Ronald J RJ   Luebeck Jens J   Shendure Jay J   Fowler Douglas M DM  

Cell systems 20171206 1


Large datasets describing the quantitative effects of mutations on protein function are becoming increasingly available. Here, we leverage these datasets to develop Envision, which predicts the magnitude of a missense variant's molecular effect. Envision combines 21,026 variant effect measurements from nine large-scale experimental mutagenesis datasets, a hitherto untapped training resource, with a supervised, stochastic gradient boosting learning algorithm. Envision outperforms other missense v  ...[more]

Similar Datasets

| S-EPMC8138887 | biostudies-literature
| S-EPMC7099065 | biostudies-literature
| S-EPMC6221071 | biostudies-literature
2014-09-17 | GSE59408 | GEO
2014-09-17 | E-GEOD-59408 | biostudies-arrayexpress
| S-EPMC7689672 | biostudies-literature
| S-EPMC2453687 | biostudies-literature
2023-04-12 | GSE189788 | GEO
| S-EPMC3383642 | biostudies-other
| S-EPMC5586385 | biostudies-literature