Unknown

Dataset Information

0

An evaluation of alternative methods for constructing phylogenies from whole genome sequence data: a case study with Salmonella.


ABSTRACT: Comparative genomics based on whole genome sequencing (WGS) is increasingly being applied to investigate questions within evolutionary and molecular biology, as well as questions concerning public health (e.g., pathogen outbreaks). Given the impact that conclusions derived from such analyses may have, we have evaluated the robustness of clustering individuals based on WGS data to three key factors: (1) next-generation sequencing (NGS) platform (HiSeq, MiSeq, IonTorrent, 454, and SOLiD), (2) algorithms used to construct a SNP (single nucleotide polymorphism) matrix (reference-based and reference-free), and (3) phylogenetic inference method (FastTreeMP, GARLI, and RAxML). We carried out these analyses on 194 whole genome sequences representing 107 unique Salmonella enterica subsp. enterica ser. Montevideo strains. Reference-based approaches for identifying SNPs produced trees that were significantly more similar to one another than those produced under the reference-free approach. Topologies inferred using a core matrix (i.e., no missing data) were significantly more discordant than those inferred using a non-core matrix that allows for some missing data. However, allowing for too much missing data likely results in a high false discovery rate of SNPs. When analyzing the same SNP matrix, we observed that the more thorough inference methods implemented in GARLI and RAxML produced more similar topologies than FastTreeMP. Our results also confirm that reproducibility varies among NGS platforms where the MiSeq had the lowest number of pairwise differences among replicate runs. Our investigation into the robustness of clustering patterns illustrates the importance of carefully considering how data from different platforms are combined and analyzed. We found clear differences in the topologies inferred, and certain methods performed significantly better than others for discriminating between the highly clonal organisms investigated here. The methods supported by our results represent a preliminary set of guidelines and a step towards developing validated standards for clustering based on whole genome sequence data.

SUBMITTER: Pettengill JB 

PROVIDER: S-EPMC4201946 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

An evaluation of alternative methods for constructing phylogenies from whole genome sequence data: a case study with Salmonella.

Pettengill James B JB   Luo Yan Y   Davis Steven S   Chen Yi Y   Gonzalez-Escalona Narjol N   Ottesen Andrea A   Rand Hugh H   Allard Marc W MW   Strain Errol E  

PeerJ 20141014


Comparative genomics based on whole genome sequencing (WGS) is increasingly being applied to investigate questions within evolutionary and molecular biology, as well as questions concerning public health (e.g., pathogen outbreaks). Given the impact that conclusions derived from such analyses may have, we have evaluated the robustness of clustering individuals based on WGS data to three key factors: (1) next-generation sequencing (NGS) platform (HiSeq, MiSeq, IonTorrent, 454, and SOLiD), (2) algo  ...[more]

Similar Datasets

| S-EPMC3773407 | biostudies-literature
| S-EPMC3014950 | biostudies-literature
| S-EPMC3320897 | biostudies-literature
| S-EPMC17875 | biostudies-literature
| S-EPMC5680185 | biostudies-literature
| S-EPMC3995342 | biostudies-literature
2012-12-13 | E-GEOD-42864 | biostudies-arrayexpress
2012-12-13 | GSE42864 | GEO
| S-EPMC3986649 | biostudies-literature