Unknown

Dataset Information

0

Evaluating the use of ABBA-BABA statistics to locate introgressed loci.


ABSTRACT: Several methods have been proposed to test for introgression across genomes. One method tests for a genome-wide excess of shared derived alleles between taxa using Patterson's D statistic, but does not establish which loci show such an excess or whether the excess is due to introgression or ancestral population structure. Several recent studies have extended the use of D by applying the statistic to small genomic regions, rather than genome-wide. Here, we use simulations and whole-genome data from Heliconius butterflies to investigate the behavior of D in small genomic regions. We find that D is unreliable in this situation as it gives inflated values when effective population size is low, causing D outliers to cluster in genomic regions of reduced diversity. As an alternative, we propose a related statistic ƒ(d), a modified version of a statistic originally developed to estimate the genome-wide fraction of admixture. ƒ(d) is not subject to the same biases as D, and is better at identifying introgressed loci. Finally, we show that both D and ƒ(d) outliers tend to cluster in regions of low absolute divergence (d(XY)), which can confound a recently proposed test for differentiating introgression from shared ancestral variation at individual loci.

SUBMITTER: Martin SH 

PROVIDER: S-EPMC4271521 | biostudies-literature | 2015 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Evaluating the use of ABBA-BABA statistics to locate introgressed loci.

Martin Simon H SH   Davey John W JW   Jiggins Chris D CD  

Molecular biology and evolution 20140922 1


Several methods have been proposed to test for introgression across genomes. One method tests for a genome-wide excess of shared derived alleles between taxa using Patterson's D statistic, but does not establish which loci show such an excess or whether the excess is due to introgression or ancestral population structure. Several recent studies have extended the use of D by applying the statistic to small genomic regions, rather than genome-wide. Here, we use simulations and whole-genome data fr  ...[more]

Similar Datasets

| S-EPMC4756653 | biostudies-literature
| S-EPMC1459736 | biostudies-literature
| S-EPMC7062050 | biostudies-literature
| S-EPMC5933812 | biostudies-literature
| S-EPMC7029882 | biostudies-literature
| S-EPMC3533942 | biostudies-literature
| S-EPMC8550643 | biostudies-literature
| S-EPMC7502413 | biostudies-literature
| S-EPMC8418218 | biostudies-literature
| S-EPMC7864580 | biostudies-literature