Unknown

Dataset Information

0

HypercubeME: two hundred million combinatorially complete datasets from a single experiment.


ABSTRACT: Motivation: Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a "combinatorially complete dataset". So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets.

Results: We present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199,847,053 unique combinatorially complete genotype combinations of dimensionality ranging from two to twelve. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data.

Availability: https://github.com/ivankovlab/HypercubeME.git.

Supplementary information: Supplementary data are available at Bioinformatics online.

SUBMITTER: Esteban LA 

PROVIDER: S-EPMC7703787 | biostudies-literature | 2019 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

HypercubeME: two hundred million combinatorially complete datasets from a single experiment.

Esteban Laura Avino LA   Lonishin Lyubov R LR   Bobrovskiy Daniil D   Leleytner Gregory G   Bogatyreva Natalya S NS   Kondrashov Fyodor A FA   Ivankov Dmitry N DN  

Bioinformatics (Oxford, England) 20191119


<h4>Motivation</h4>Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a "combinatorially complete dat  ...[more]

Similar Datasets

| S-EPMC4223573 | biostudies-other
| S-EPMC4418875 | biostudies-other
| S-EPMC4904710 | biostudies-literature
2020-08-17 | GSE153897 | GEO
| S-EPMC8760942 | biostudies-literature
| S-EPMC8493200 | biostudies-literature
| S-EPMC7669267 | biostudies-literature
| S-EPMC186653 | biostudies-literature
| S-EPMC3834737 | biostudies-literature
| S-EPMC5497850 | biostudies-other