Unknown

Dataset Information

0

Reconstruction of evolving gene variants and fitness from short sequencing reads.


ABSTRACT: Directed evolution can generate proteins with tailor-made activities. However, full-length genotypes, their frequencies and fitnesses are difficult to measure for evolving gene-length biomolecules using most high-throughput DNA sequencing methods, as short read lengths can lose mutation linkages in haplotypes. Here we present Evoracle, a machine learning method that accurately reconstructs full-length genotypes (R2 = 0.94) and fitness using short-read data from directed evolution experiments, with substantial improvements over related methods. We validate Evoracle on phage-assisted continuous evolution (PACE) and phage-assisted non-continuous evolution (PANCE) of adenine base editors and OrthoRep evolution of drug-resistant enzymes. Evoracle retains strong performance (R2 = 0.86) on data with complete linkage loss between neighboring nucleotides and large measurement noise, such as pooled Sanger sequencing data (~US$10 per timepoint), and broadens the accessibility of training machine learning models on gene variant fitnesses. Evoracle can also identify high-fitness variants, including low-frequency 'rising stars', well before they are identifiable from consensus mutations.

SUBMITTER: Shen MW 

PROVIDER: S-EPMC8551035 | biostudies-literature | 2021 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Reconstruction of evolving gene variants and fitness from short sequencing reads.

Shen Max W MW   Zhao Kevin T KT   Liu David R DR  

Nature chemical biology 20211011 11


Directed evolution can generate proteins with tailor-made activities. However, full-length genotypes, their frequencies and fitnesses are difficult to measure for evolving gene-length biomolecules using most high-throughput DNA sequencing methods, as short read lengths can lose mutation linkages in haplotypes. Here we present Evoracle, a machine learning method that accurately reconstructs full-length genotypes (R<sup>2</sup> = 0.94) and fitness using short-read data from directed evolution expe  ...[more]

Similar Datasets

| S-EPMC3270033 | biostudies-literature
| S-EPMC2577856 | biostudies-literature
| S-EPMC5530257 | biostudies-literature
| S-EPMC3995342 | biostudies-literature
| 2148709 | ecrin-mdr-crc
| S-EPMC3495688 | biostudies-literature
| S-EPMC4720449 | biostudies-literature
| S-EPMC9148508 | biostudies-literature
| S-EPMC3060140 | biostudies-literature
| S-EPMC2593571 | biostudies-literature