Unknown

Dataset Information

0

Cluster learning-assisted directed evolution.


ABSTRACT: Directed evolution, a strategy for protein engineering, optimizes protein properties (i.e., fitness) by expensive and time-consuming screening or selection of large mutational sequence space. Machine learning-assisted directed evolution (MLDE), which screens sequence properties in silico, can accelerate the optimization and reduce the experimental burden. This work introduces a MLDE framework, cluster learning-assisted directed evolution (CLADE), that combines hierarchical unsupervised clustering sampling and supervised learning to guide protein engineering. The clustering sampling selectively picks and screens variants in targeted subspaces, which guides the subsequent generation of diverse training sets. In the last stage, accurate predictions via supervised learning models improve final outcomes. By sequentially screening 480 sequences out of 160,000 in a four-site combinatorial library with five equal experimental batches, CLADE achieves the global maximal fitness hit rate up to 91.0% and 34.0% for GB1 and PhoQ datasets, respectively, improved from 18.6% and 7.2% obtained by random-sampling-based MLDE.

SUBMITTER: Qiu Y 

PROVIDER: S-EPMC9267417 | biostudies-literature | 2021 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Cluster learning-assisted directed evolution.

Qiu Yuchi Y   Hu Jian J   Wei Guo-Wei GW  

Nature computational science 20211209 12


Directed evolution, a strategy for protein engineering, optimizes protein properties (i.e., fitness) by expensive and time-consuming screening or selection of large mutational sequence space. Machine learning-assisted directed evolution (MLDE), which screens sequence properties <i>in silico</i>, can accelerate the optimization and reduce the experimental burden. This work introduces a MLDE framework, cluster learning-assisted directed evolution (CLADE), that combines hierarchical unsupervised cl  ...[more]

Similar Datasets

| S-EPMC6500146 | biostudies-literature
| S-EPMC9855281 | biostudies-literature
| S-EPMC10804518 | biostudies-literature
| S-EPMC4410319 | biostudies-literature
| S-EPMC10582076 | biostudies-literature
| S-EPMC8225853 | biostudies-literature
| S-EPMC11838293 | biostudies-literature
| S-EPMC10856788 | biostudies-literature
| S-EPMC8025677 | biostudies-literature
| S-EPMC7003928 | biostudies-literature