Dataset Information

Inferring fitness landscapes by regression produces biased estimates of epistasis.

ABSTRACT: The genotype-fitness map plays a fundamental role in shaping the dynamics of evolution. However, it is difficult to directly measure a fitness landscape in practice, because the number of possible genotypes is astronomical. One approach is to sample as many genotypes as possible, measure their fitnesses, and fit a statistical model of the landscape that includes additive and pairwise interactive effects between loci. Here, we elucidate the pitfalls of using such regressions by studying artificial but mathematically convenient fitness landscapes. We identify two sources of bias inherent in these regression procedures, each of which tends to underestimate high fitnesses and overestimate low fitnesses. We characterize these biases for random sampling of genotypes as well as samples drawn from a population under selection in the Wright-Fisher model of evolutionary dynamics. We show that common measures of epistasis, such as the number of monotonically increasing paths between ancestral and derived genotypes, the prevalence of sign epistasis, and the number of local fitness maxima, are distorted in the inferred landscape. As a result, the inferred landscape will provide systematically biased predictions for the dynamics of adaptation. We identify the same biases in a computational RNA-folding landscape as well as regulatory sequence binding data treated with the same fitting procedure. Finally, we present a method to ameliorate these biases in some cases.

SUBMITTER: Otwinowski J

PROVIDER: S-EPMC4050575 | biostudies-literature | 2014 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Inferring fitness landscapes by regression produces biased estimates of epistasis.

Otwinowski Jakub J Plotkin Joshua B JB

Proceedings of the National Academy of Sciences of the United States of America 20140519 22

The genotype-fitness map plays a fundamental role in shaping the dynamics of evolution. However, it is difficult to directly measure a fitness landscape in practice, because the number of possible genotypes is astronomical. One approach is to sample as many genotypes as possible, measure their fitnesses, and fit a statistical model of the landscape that includes additive and pairwise interactive effects between loci. Here, we elucidate the pitfalls of using such regressions by studying artificia ...[more]

PMID: 24843135

Dataset Information

Inferring fitness landscapes by regression produces biased estimates of epistasis.

Publications

Inferring fitness landscapes by regression produces biased estimates of epistasis.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

The distribution of epistasis on simple fitness landscapes.
| S-EPMC6501363 | biostudies-literature

Biased estimates of diminishing-returns epistasis? Empirical evidence revisited.
| S-EPMC4256761 | biostudies-literature

A framework for inferring fitness landscapes of patient-derived viruses using quasispecies theory.
| S-EPMC4286684 | biostudies-literature

Inferring fitness landscapes and selection on phenotypic states from single-cell genealogical data.
| S-EPMC5360348 | biostudies-literature

Inferring the shape of global epistasis.
| S-EPMC6094095 | biostudies-literature

Efficient Determination of Free Energy Landscapes in Multiple Dimensions from Biased Umbrella Sampling Simulations Using Linear Regression.
| S-EPMC4894281 | biostudies-literature

Biased evaluations emerge from inferring hidden causes.
| S-EPMC8423857 | biostudies-literature

Naturally segregating loci exhibit epistasis for fitness.
| S-EPMC4571682 | biostudies-literature

Additive Phenotypes Underlie Epistasis of Fitness Effects.
| S-EPMC5753867 | biostudies-literature

Fitness Landscapes of Functional RNAs.
| S-EPMC4598650 | biostudies-literature