Dataset Information

Hash-Based Core Genome Multilocus Sequence Typing for Clostridium difficile.

ABSTRACT: Pathogen whole-genome sequencing has huge potential as a tool to better understand infection transmission. However, rapidly identifying closely related genomes among a background of thousands of other genomes is challenging. Here, we describe a refinement to core genome multilocus sequence typing (cgMLST) in which alleles at each gene are reproducibly converted to a unique hash, or short string of letters (hash-cgMLST). This avoids the resource-intensive need for a single centralized database of sequentially numbered alleles. We test the reproducibility and discriminatory power of cgMLST/hash-cgMLST compared to those of mapping-based approaches in Clostridium difficile, using repeated sequencing of the same isolates (replicates) and data from consecutive infection isolates from six English hospitals. Hash-cgMLST provided the same results as standard cgMLST, with minimal performance penalty. Comparing 272 replicate sequence pairs using reference-based mapping, there were 0, 1, or 2 single-nucleotide polymorphisms (SNPs) between 262 (96%), 5 (2%), and 1 (<1%) of the pairs, respectively. Using hash-cgMLST, 218 (80%) of replicate pairs assembled with SPAdes had zero gene differences, and 31 (11%), 5 (2%), and 18 (7%) pairs had 1, 2, and >2 differences, respectively. False gene differences were clustered in specific genes and associated with fragmented assemblies, but were reduced using the SKESA assembler. Considering 412 pairs of infections with ?2 SNPS, i.e., consistent with recent transmission, 376 (91%) had ?2 gene differences and 16 (4%) had ?4. Comparing a genome to 100,000 others took <1 min using hash-cgMLST. Hash-cgMLST is an effective surveillance tool for rapidly identifying clusters of related genomes. However, cgMLST/hash-cgMLST generate more false variants than mapping-based approaches. Follow-up mapping-based analyses are likely required to precisely define close genetic relationships.

SUBMITTER: Eyre DW

PROVIDER: S-EPMC6935933 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Hash-Based Core Genome Multilocus Sequence Typing for Clostridium difficile.

Eyre David W DW Peto Tim E A TEA Crook Derrick W DW Walker A Sarah AS Wilcox Mark H MH

Journal of clinical microbiology 20191223 1

Pathogen whole-genome sequencing has huge potential as a tool to better understand infection transmission. However, rapidly identifying closely related genomes among a background of thousands of other genomes is challenging. Here, we describe a refinement to core genome multilocus sequence typing (cgMLST) in which alleles at each gene are reproducibly converted to a unique hash, or short string of letters (hash-cgMLST). This avoids the resource-intensive need for a single centralized database of ...[more]

PMID: 31666367

Similar Datasets

Project description:ABSTRACT Clostridium perfringens is a spore-forming anaerobic pathogen responsible for a variety of histotoxic and intestinal infections in humans and animals. High-resolution genotyping aiming to identify bacteria at strain level has become increasingly important in modern microbiology to understand pathogen transmission pathways and to tackle infection sources. This study aimed at establishing a publicly available genome-wide multilocus sequence-typing (MLST) scheme for C. perfringens. A total of 1,431 highly conserved core genes (1.34 megabases; 50% of the reference genome genes) were indexed for a core genome-based MLST (cgMLST) scheme for C. perfringens. The scheme was applied to 282 ecologically and geographically diverse genomes, showing that the genotyping results of cgMLST were highly congruent with the core genome-based single-nucleotide-polymorphism typing in terms of resolution and tree topology. In addition, the cgMLST provided a greater discrimination than classical MLST methods for C. perfringens. The usability of the scheme for outbreak analysis was confirmed by reinvestigating published outbreaks of C. perfringens-associated infections in the United States and the United Kingdom. In summary, a publicly available scheme and an allele nomenclature database for genomic typing of C. perfringens have been established and can be used for broad-based and standardized epidemiological studies. IMPORTANCE Global epidemiological surveillance of bacterial pathogens is enhanced by the availability of standard tools and sharing of typing data. The use of whole-genome sequencing has opened the possibility for high-resolution characterization of bacterial strains down to the clonal and subclonal levels. Core genome multilocus sequence typing is a robust system that uses highly conserved core genes for deep genotyping. The method has been successfully and widely used to describe the epidemiology of various bacterial species. Nevertheless, a cgMLST typing scheme for Clostridium perfringens is currently not publicly available. In this study, we (i) developed a cgMLST typing scheme for C. perfringens, (ii) evaluated the performance of the scheme on different sets of C. perfringens genomes from different hosts and geographic regions as well as from different outbreak situations, and, finally, (iii) made this scheme publicly available supported by an allele nomenclature database for global and standard genomic typing.

Project description:Streptococcus mutans is one of the primary pathogens responsible for the development of dental caries. Recent whole-genome sequencing (WGS)-based core genome multilocus sequence typing (cgMLST) approaches have been employed in epidemiological studies of specific human pathogens. However, this approach has not been reported in studies of S. mutans Here, we therefore developed a cgMLST scheme for S. mutans We surveyed 199 available S. mutans genomes as a means of identifying cgMLST targets, developing a scheme that incorporated 594 targets from the S. mutans UA159 reference genome. Sixty-eight sequence types (STs) were identified in this cgMLST scheme (cgSTs) in 80 S. mutans isolates from 40 children that were sequenced in this study, compared to 35 STs identified by multilocus sequence typing (MLST). Fifty-six cgSTs (82.35%) were associated with a single isolate based on our cgMLST scheme, which is significantly higher than in the MLST scheme (11.43%). In addition, 58.06% of all MLST profiles with ≥2 isolates were further differentiated by our cgMLST scheme. Topological analyses of the maximum likelihood phylogenetic trees revealed that our cgMLST scheme was more reliable than the MLST scheme. A minimum spanning tree of 145 S. mutans isolates from 10 countries developed based upon the cgMLST scheme highlighted the diverse population structure of S. mutans This cgMLST scheme thus offers a new molecular typing method suitable for evaluating the epidemiological distribution of this pathogen and has the potential to serve as a benchmark for future global studies of the epidemiological nature of dental caries.IMPORTANCEStreptococcus mutans is regarded as a major pathogen responsible for the onset of dental caries. S. mutans can transmit among people, especially within families. In this study, we established a new epidemiological approach to S. mutans classification. This approach can effectively differentiate among closely related isolates and offers superior reliability relative to that of the traditional MLST molecular typing method. As such, it has the potential to better support effective public health strategies centered around this bacterium that are aimed at preventing and treating dental caries.

Dataset Information

Hash-Based Core Genome Multilocus Sequence Typing for Clostridium difficile.

Publications

Hash-Based Core Genome Multilocus Sequence Typing for Clostridium difficile.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets