Unknown

Dataset Information

0

Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference.


ABSTRACT: Phylodynamics seeks to estimate effective population size fluctuations from molecular sequences of individuals sampled from a population of interest. One way to accomplish this task formulates an observed sequence data likelihood exploiting a coalescent model for the sampled individuals' genealogy and then integrating over all possible genealogies via Monte Carlo or, less efficiently, by conditioning on one genealogy estimated from the sequence data. However, when analyzing sequences sampled serially through time, current methods implicitly assume either that sampling times are fixed deterministically by the data collection protocol or that their distribution does not depend on the size of the population. Through simulation, we first show that, when sampling times do probabilistically depend on effective population size, estimation methods may be systematically biased. To correct for this deficiency, we propose a new model that explicitly accounts for preferential sampling by modeling the sampling times as an inhomogeneous Poisson process dependent on effective population size. We demonstrate that in the presence of preferential sampling our new model not only reduces bias, but also improves estimation precision. Finally, we compare the performance of the currently used phylodynamic methods with our proposed model through clinically-relevant, seasonal human influenza examples.

SUBMITTER: Karcher MD 

PROVIDER: S-EPMC4777449 | biostudies-literature | 2016 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference.

Karcher Michael D MD   Palacios Julia A JA   Bedford Trevor T   Suchard Marc A MA   Minin Vladimir N VN  

PLoS computational biology 20160303 3


Phylodynamics seeks to estimate effective population size fluctuations from molecular sequences of individuals sampled from a population of interest. One way to accomplish this task formulates an observed sequence data likelihood exploiting a coalescent model for the sampled individuals' genealogy and then integrating over all possible genealogies via Monte Carlo or, less efficiently, by conditioning on one genealogy estimated from the sequence data. However, when analyzing sequences sampled ser  ...[more]

Similar Datasets

| S-EPMC4223917 | biostudies-literature
| S-EPMC5863931 | biostudies-literature
| S-EPMC5400386 | biostudies-literature
| S-EPMC3990497 | biostudies-literature
| S-EPMC6258546 | biostudies-literature
| S-EPMC5528224 | biostudies-other
| S-EPMC7253210 | biostudies-literature
| S-EPMC4540972 | biostudies-literature
| S-EPMC166164 | biostudies-literature
| S-EPMC8382123 | biostudies-literature