Dataset Information

DeCoDe: degenerate codon design for complete protein-coding DNA libraries.

ABSTRACT:

Motivation

High-throughput protein screening is a critical technique for dissecting and designing protein function. Libraries for these assays can be created through a number of means, including targeted or random mutagenesis of a template protein sequence or direct DNA synthesis. However, mutagenic library construction methods often yield vastly more nonfunctional than functional variants and, despite advances in large-scale DNA synthesis, individual synthesis of each desired DNA template is often prohibitively expensive. Consequently, many protein-screening libraries rely on the use of degenerate codons (DCs), mixtures of DNA bases incorporated at specific positions during DNA synthesis, to generate highly diverse protein-variant pools from only a few low-cost synthesis reactions. However, selecting DCs for sets of sequences that covary at multiple positions dramatically increases the difficulty of designing a DC library and leads to the creation of many undesired variants that can quickly outstrip screening capacity.

Results

We introduce a novel algorithm for total DC library optimization, degenerate codon design (DeCoDe), based on integer linear programming. DeCoDe significantly outperforms state-of-the-art DC optimization algorithms and scales well to more than a hundred proteins sharing complex patterns of covariation (e.g. the lab-derived avGFP lineage). Moreover, DeCoDe is, to our knowledge, the first DC design algorithm with the capability to encode mixed-length protein libraries. We anticipate DeCoDe to be broadly useful for a variety of library generation problems, ranging from protein engineering attempts that leverage mutual information to the reconstruction of ancestral protein states.

Availability and implementation

github.com/OrensteinLab/DeCoDe.

Contact

yaronore@bgu.ac.il.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Shimko TC

PROVIDER: S-EPMC7267834 | biostudies-literature | 2020 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

DeCoDe: degenerate codon design for complete protein-coding DNA libraries.

Shimko Tyler C TC Fordyce Polly M PM Orenstein Yaron Y

Bioinformatics (Oxford, England) 20200601 11

<h4>Motivation</h4>High-throughput protein screening is a critical technique for dissecting and designing protein function. Libraries for these assays can be created through a number of means, including targeted or random mutagenesis of a template protein sequence or direct DNA synthesis. However, mutagenic library construction methods often yield vastly more nonfunctional than functional variants and, despite advances in large-scale DNA synthesis, individual synthesis of each desired DNA templa ...[more]

PMID: 32176271

Dataset Information

DeCoDe: degenerate codon design for complete protein-coding DNA libraries.

Motivation

Results

Availability and implementation

Contact

Supplementary information

Publications

DeCoDe: degenerate codon design for complete protein-coding DNA libraries.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Scalable design of orthogonal DNA barcode libraries.
| S-EPMC11208133 | biostudies-literature

SwiftLib: rapid degenerate-codon-library optimization through dynamic programming.
| S-EPMC4357694 | biostudies-literature

Increasing recombinant protein production in E. coli via FACS-based selection of N-terminal coding DNA libraries.
| S-EPMC11880969 | biostudies-literature

Structural space of protein-protein interfaces is degenerate, close to complete, and highly connected.
| S-EPMC3012513 | biostudies-literature

Design, preparation, and selection of DNA-encoded dynamic libraries.
| S-EPMC5510007 | biostudies-literature

Degenerate codon mixing for PCR-based manipulation of highly repetitive sequences.
| S-EPMC5870680 | biostudies-literature

Design, Construction, and Screening of Diversified Pyrimidine-Focused DNA-Encoded Libraries.
| S-EPMC10424316 | biostudies-literature

Design and validation of DNA libraries for multiplexing proximity ligation assays.
| S-EPMC4227721 | biostudies-literature

DPPrimer - A Degenerate PCR Primer Design Tool.
| S-EPMC3842581 | biostudies-literature

Codon-triplet context unveils unique features of the Candida albicans protein coding genome.
| S-EPMC2244636 | biostudies-literature