Dataset Information

Automated analysis of phylogenetic clusters.

ABSTRACT: BACKGROUND:As sequence data sets used for the investigation of pathogen transmission patterns increase in size, automated tools and standardized methods for cluster analysis have become necessary. We have developed an automated Cluster Picker which identifies monophyletic clades meeting user-input criteria for bootstrap support and maximum genetic distance within large phylogenetic trees. A second tool, the Cluster Matcher, automates the process of linking genetic data to epidemiological or clinical data, and matches clusters between runs of the Cluster Picker. RESULTS:We explore the effect of different bootstrap and genetic distance thresholds on clusters identified in a data set of publicly available HIV sequences, and compare these results to those of a previously published tool for cluster identification. To demonstrate their utility, we then use the Cluster Picker and Cluster Matcher together to investigate how clusters in the data set changed over time. We find that clusters containing sequences from more than one UK location at the first time point (multiple origin) were significantly more likely to grow than those representing only a single location. CONCLUSIONS:The Cluster Picker and Cluster Matcher can rapidly process phylogenetic trees containing tens of thousands of sequences. Together these tools will facilitate comparisons of pathogen transmission dynamics between studies and countries.

SUBMITTER: Ragonnet-Cronin M

PROVIDER: S-EPMC4228337 | biostudies-literature | 2013 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Automated analysis of phylogenetic clusters.

Ragonnet-Cronin Manon M Hodcroft Emma E Hué Stéphane S Fearnhill Esther E Delpech Valerie V Brown Andrew J Leigh AJ Lycett Samantha S

BMC bioinformatics 20131106

<h4>Background</h4>As sequence data sets used for the investigation of pathogen transmission patterns increase in size, automated tools and standardized methods for cluster analysis have become necessary. We have developed an automated Cluster Picker which identifies monophyletic clades meeting user-input criteria for bootstrap support and maximum genetic distance within large phylogenetic trees. A second tool, the Cluster Matcher, automates the process of linking genetic data to epidemiological ...[more]

PMID: 24191891

Similar Datasets

Project description:Current bacterial taxonomy is mostly based on phenotypic criteria, which may yield misleading interpretations in classification and identification. As a result, bacteria not closely related may be grouped together as a genus or species. For pathogenic bacteria, incorrect classification or misidentification could be disastrous. There is therefore an urgent need for appropriate methodologies to classify bacteria according to phylogeny and corresponding new approaches that permit their rapid and accurate identification. For this purpose, we have devised a strategy enabling us to resolve phylogenetic clusters of bacteria by comparing their genome structures. These structures were revealed by cleaving genomic DNA with the endonuclease I-CeuI, which cuts within the 23S ribosomal DNA (rDNA) sequences, and by mapping the resulting large DNA fragments with pulsed-field gel electrophoresis. We tested this experimental system on two representative bacterial genera: Salmonella and Pasteurella. Among Salmonella spp., I-CeuI mapping revealed virtually indistinguishable genome structures, demonstrating a high degree of structural conservation. Consistent with this, 16S rDNA sequences are also highly conserved among the Salmonella spp. In marked contrast, the Pasteurella strains have very different genome structures among and even within individual species. The divergence of Pasteurella was also reflected in 16S rDNA sequences and far exceeded that seen between Escherichia and Salmonella. Based on this diversity, the Pasteurella haemolytica strains we analyzed could be divided into 14 phylogenetic groups and the Pasteurella multocida strains could be divided into 9 groups. If criteria for defining bacterial species or genera similar to those used for Salmonella and Escherichia coli were applied, the striking phylogenetic diversity would allow bacteria in the currently recognized species of P. multocida and P. haemolytica to be divided into different species, genera, or even higher ranks. On the other hand, strains of Pasteurella ureae and Pasteurella pneumotropica are very similar to those of P. multocida in both genome structure and 16S rDNA sequence and should be regarded as strains within this species. We conclude that large-scale genome structure can be a sensitive indicator of phylogenetic relationships and that, therefore, I-CeuI-based genomic mapping is an efficient tool for probing the phylogenetic status of bacteria.

Dataset Information

Automated analysis of phylogenetic clusters.

Publications

Automated analysis of phylogenetic clusters.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets