Unknown

Dataset Information

0

HypercubeME: two hundred million combinatorially complete datasets from a single experiment.


ABSTRACT: Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a "combinatorially complete dataset". So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets. We present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199,847,053 unique combinatorially complete genotype combinations of dimensionality ranging from two to twelve. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data. https://github.com/ivankovlab/HypercubeME.git. Supplementary data are available at Bioinformatics online.

SUBMITTER: Esteban LA 

PROVIDER: S-EPMC7703787 | biostudies-literature | 2019 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

HypercubeME: two hundred million combinatorially complete datasets from a single experiment.

Esteban Laura Avino LA   Lonishin Lyubov R LR   Bobrovskiy Daniil D   Leleytner Gregory G   Bogatyreva Natalya S NS   Kondrashov Fyodor A FA   Ivankov Dmitry N DN  

Bioinformatics (Oxford, England) 20191119


<h4>Motivation</h4>Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a "combinatorially complete dat  ...[more]

Similar Datasets

| S-EPMC4223573 | biostudies-other
| S-EPMC4418875 | biostudies-literature
| S-EPMC4904710 | biostudies-literature
2020-08-17 | GSE153897 | GEO
| S-EPMC10823260 | biostudies-literature
| S-EPMC7669267 | biostudies-literature
| S-EPMC8760942 | biostudies-literature
| S-EPMC8493200 | biostudies-literature
| S-EPMC5497850 | biostudies-other
| S-EPMC10181170 | biostudies-literature