Unknown

Dataset Information

0

Direct prediction of intrinsically disordered protein conformational properties from sequence.


ABSTRACT: Intrinsically disordered regions (IDRs) are ubiquitous across all domains of life and play a range of functional roles. While folded domains are generally well described by a stable three-dimensional structure, IDRs exist in a collection of interconverting states known as an ensemble. This structural heterogeneity means that IDRs are largely absent from the Protein Data Bank, contributing to a lack of computational approaches to predict ensemble conformational properties from sequence. Here we combine rational sequence design, large-scale molecular simulations and deep learning to develop ALBATROSS, a deep-learning model for predicting ensemble dimensions of IDRs, including the radius of gyration, end-to-end distance, polymer-scaling exponent and ensemble asphericity, directly from sequences at a proteome-wide scale. ALBATROSS is lightweight, easy to use and accessible as both a locally installable software package and a point-and-click-style interface via Google Colab notebooks. We first demonstrate the applicability of our predictors by examining the generalizability of sequence-ensemble relationships in IDRs. Then, we leverage the high-throughput nature of ALBATROSS to characterize the sequence-specific biophysical behavior of IDRs within and between proteomes.

SUBMITTER: Lotthammer JM 

PROVIDER: S-EPMC10927563 | biostudies-literature | 2024 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Direct prediction of intrinsically disordered protein conformational properties from sequence.

Lotthammer Jeffrey M JM   Ginell Garrett M GM   Griffith Daniel D   Emenecker Ryan J RJ   Holehouse Alex S AS  

Nature methods 20240131 3


Intrinsically disordered regions (IDRs) are ubiquitous across all domains of life and play a range of functional roles. While folded domains are generally well described by a stable three-dimensional structure, IDRs exist in a collection of interconverting states known as an ensemble. This structural heterogeneity means that IDRs are largely absent from the Protein Data Bank, contributing to a lack of computational approaches to predict ensemble conformational properties from sequence. Here we c  ...[more]

Similar Datasets

| S-EPMC5675102 | biostudies-other
| S-EPMC4008819 | biostudies-literature
| S-EPMC8178997 | biostudies-literature
| S-EPMC10262780 | biostudies-literature
| S-EPMC5857923 | biostudies-literature
| S-EPMC9770960 | biostudies-literature
| S-EPMC7031737 | biostudies-literature
| S-EPMC11352843 | biostudies-literature
| S-EPMC10634714 | biostudies-literature
| S-EPMC3776332 | biostudies-literature