Other

Dataset Information

0

Site saturation mutagenesis of 500 human protein domains


ABSTRACT: Missense variants that change the amino acid sequences of proteins cause one third of human genetic diseases. Tens of millions of missense variants exist in the current human population, with the vast majority having unknown functional consequences. Here we present the first large-scale experimental analysis of human missense variants. Using DNA synthesis and cellular selection experiments we quantify the impact of >500,000 variants on the abundance of >500 human protein domains. This dataset, Domainome 1.0, reveals that >60% of disease-causing variants destabilize proteins. The contribution of stability to protein fitness varies across proteins and diseases, and is particularly important in recessive disorders. Combining experimental stability measurements with large language models we annotate functionally important sites across domains. Fitting energy models to the data demonstrates the conservation of mutation effects in homologous domains and allows stability to be accurately predicted for entire domain families. Domainome 1.0 demonstrates the feasibility of assaying human protein variant effects at scale and provides a large consistent reference dataset for clinical variant interpretation and the training and benchmarking of computational methods.

ORGANISM(S): Saccharomyces cerevisiae

PROVIDER: GSE265942 | GEO | 2024/04/27

REPOSITORIES: GEO

Similar Datasets

2024-01-31 | GSE254639 | GEO
2023-05-22 | GSE226732 | GEO
2018-03-13 | GSE108727 | GEO
2021-09-13 | GSE159469 | GEO
2023-12-05 | GSE233827 | GEO
2024-01-31 | GSE254618 | GEO
2020-12-23 | GSE162130 | GEO
2023-10-13 | GSE237142 | GEO
2019-05-31 | GSE111394 | GEO
2019-05-31 | GSE131322 | GEO