Unknown

Dataset Information

0

Applications of random forest feature selection for fine-scale genetic population assignment.


ABSTRACT: Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine-learning algorithms (random forest, regularized random forest and guided regularized random forest) compared with FST ranking for selection of single nucleotide polymorphisms (SNP) for fine-scale population assignment. We applied these methods to an unpublished SNP data set for Atlantic salmon (Salmo salar) and a published SNP data set for Alaskan Chinook salmon (Oncorhynchus tshawytscha). In each species, we identified the minimum panel size required to obtain a self-assignment accuracy of at least 90% using each method to create panels of 50-700 markers Panels of SNPs identified using random forest-based methods performed up to 7.8 and 11.2 percentage points better than FST-selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self-assignment accuracy ?90% was obtained with panels of 670 and 384 SNPs for each data set, respectively, a level of accuracy never reached for these species using FST-selected panels. Our results demonstrate a role for machine-learning approaches in marker selection across large genomic data sets to improve assignment for management and conservation of exploited populations.

SUBMITTER: Sylvester EVA 

PROVIDER: S-EPMC5775496 | biostudies-literature | 2018 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Applications of random forest feature selection for fine-scale genetic population assignment.

Sylvester Emma V A EVA   Bentzen Paul P   Bradbury Ian R IR   Clément Marie M   Pearce Jon J   Horne John J   Beiko Robert G RG  

Evolutionary applications 20170914 2


Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine-learning algorithms (random forest, regularized random forest and guided regularized random forest) compared with <i>F</i><sub>ST</sub> ranking for selection of single nucleotide polymorphisms (SNP) for fine-scale population assignment. We applied these methods to an unpublished SNP data  ...[more]

Similar Datasets

| S-EPMC3445488 | biostudies-literature
| S-EPMC4206426 | biostudies-literature
| S-EPMC4632200 | biostudies-literature
| S-EPMC3532813 | biostudies-literature