Unknown

Dataset Information

0

Hash-Based Core Genome Multilocus Sequence Typing for Clostridium difficile.


ABSTRACT: Pathogen whole-genome sequencing has huge potential as a tool to better understand infection transmission. However, rapidly identifying closely related genomes among a background of thousands of other genomes is challenging. Here, we describe a refinement to core genome multilocus sequence typing (cgMLST) in which alleles at each gene are reproducibly converted to a unique hash, or short string of letters (hash-cgMLST). This avoids the resource-intensive need for a single centralized database of sequentially numbered alleles. We test the reproducibility and discriminatory power of cgMLST/hash-cgMLST compared to those of mapping-based approaches in Clostridium difficile, using repeated sequencing of the same isolates (replicates) and data from consecutive infection isolates from six English hospitals. Hash-cgMLST provided the same results as standard cgMLST, with minimal performance penalty. Comparing 272 replicate sequence pairs using reference-based mapping, there were 0, 1, or 2 single-nucleotide polymorphisms (SNPs) between 262 (96%), 5 (2%), and 1 (<1%) of the pairs, respectively. Using hash-cgMLST, 218 (80%) of replicate pairs assembled with SPAdes had zero gene differences, and 31 (11%), 5 (2%), and 18 (7%) pairs had 1, 2, and >2 differences, respectively. False gene differences were clustered in specific genes and associated with fragmented assemblies, but were reduced using the SKESA assembler. Considering 412 pairs of infections with ?2 SNPS, i.e., consistent with recent transmission, 376 (91%) had ?2 gene differences and 16 (4%) had ?4. Comparing a genome to 100,000 others took <1 min using hash-cgMLST. Hash-cgMLST is an effective surveillance tool for rapidly identifying clusters of related genomes. However, cgMLST/hash-cgMLST generate more false variants than mapping-based approaches. Follow-up mapping-based analyses are likely required to precisely define close genetic relationships.

SUBMITTER: Eyre DW 

PROVIDER: S-EPMC6935933 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Hash-Based Core Genome Multilocus Sequence Typing for Clostridium difficile.

Eyre David W DW   Peto Tim E A TEA   Crook Derrick W DW   Walker A Sarah AS   Wilcox Mark H MH  

Journal of clinical microbiology 20191223 1


Pathogen whole-genome sequencing has huge potential as a tool to better understand infection transmission. However, rapidly identifying closely related genomes among a background of thousands of other genomes is challenging. Here, we describe a refinement to core genome multilocus sequence typing (cgMLST) in which alleles at each gene are reproducibly converted to a unique hash, or short string of letters (hash-cgMLST). This avoids the resource-intensive need for a single centralized database of  ...[more]

Similar Datasets

| S-EPMC5971537 | biostudies-literature
| S-EPMC8549748 | biostudies-literature
| S-EPMC10651541 | biostudies-literature
| S-EPMC9170587 | biostudies-literature
| S-EPMC5340756 | biostudies-literature
| S-EPMC4540939 | biostudies-literature
| S-EPMC8106710 | biostudies-literature
| S-EPMC6425188 | biostudies-literature
| S-EPMC7343978 | biostudies-literature
| S-EPMC427854 | biostudies-literature