Unknown

Dataset Information

0

PMERGE: Computational filtering of paralogous sequences from RAD-seq data.


ABSTRACT: Restriction-site associated DNA sequencing (RAD-seq) can identify and score thousands of genetic markers from a group of samples for population-genetics studies. One challenge of de novo RAD-seq analysis is to distinguish paralogous sequence variants (PSVs) from true single-nucleotide polymorphisms (SNPs) associated with orthologous loci. In the absence of a reference genome, it is difficult to differentiate true SNPs from PSVs, and their impact on downstream analysis remains unclear. Here, we introduce a network-based approach, PMERGE that connects fragments based on their DNA sequence similarity to identify probable PSVs. Applying our method to de novo RAD-seq data from 150 Atlantic salmon (Salmo salar) samples collected from 15 locations across the Southern Newfoundland coast allowed the identification of 87% of total PSVs identified through alignment to the Atlantic salmon genome. Removal of these paralogs altered the inferred population structure, highlighting the potential impact of filtering in RAD-seq analysis. PMERGE is also applied to a green crab (Carcinus maenas) data set consisting of 242 samples from 11 different locations and was successfully able to identify and remove the majority of paralogous loci (62%). The PMERGE software can be run as part of the widely used Stacks analysis package.

SUBMITTER: Nadukkalam Ravindran P 

PROVIDER: S-EPMC6065343 | biostudies-literature | 2018 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

PMERGE: Computational filtering of paralogous sequences from RAD-seq data.

Nadukkalam Ravindran Praveen P   Bentzen Paul P   Bradbury Ian R IR   Beiko Robert G RG  

Ecology and evolution 20180611 14


Restriction-site associated DNA sequencing (RAD-seq) can identify and score thousands of genetic markers from a group of samples for population-genetics studies. One challenge of de novo RAD-seq analysis is to distinguish paralogous sequence variants (PSVs) from true single-nucleotide polymorphisms (SNPs) associated with orthologous loci. In the absence of a reference genome, it is difficult to differentiate true SNPs from PSVs, and their impact on downstream analysis remains unclear. Here, we i  ...[more]

Similar Datasets

| S-EPMC5730610 | biostudies-literature
| S-EPMC5039923 | biostudies-literature
| S-EPMC6304778 | biostudies-other
| S-EPMC4940913 | biostudies-other
| S-EPMC4256836 | biostudies-literature
| S-EPMC6549478 | biostudies-literature
| S-EPMC3737546 | biostudies-literature
| S-EPMC4196881 | biostudies-literature
| S-EPMC7915145 | biostudies-literature
| S-EPMC7293188 | biostudies-literature