Dataset Information

In Silico Serotyping Based on Whole-Genome Sequencing Improves the Accuracy of Shigella Identification.

ABSTRACT: Bacteria of the genus Shigella, consisting of 4 species and >50 serotypes, cause shigellosis, a foodborne disease of significant morbidity, mortality, and economic loss worldwide. Classical Shigella identification based on selective media and serology is tedious, time-consuming, expensive, and not always accurate. A molecular diagnostic assay does not distinguish Shigella at the species level or from enteroinvasive Escherichia coli (EIEC). We inspected genomic sequences from 221 Shigella isolates and observed low concordance rates between conventional designation and molecular serotyping: 86.4% and 80.5% at the species and serotype levels, respectively. Serotype determinants for 6 additional serotypes were identified. Examination of differentiation gene markers commonly perceived as characteristic hallmarks in Shigella showed high variability among different serotypes. Using this information, we developed ShigaTyper, an automated workflow that utilizes limited computational resources to accurately and rapidly determine 59 Shigella serotypes using Illumina paired-end whole-genome sequencing (WGS) reads. Shigella serotype determinants and species-specific diagnostic markers were first identified through read alignment to an in-house curated reference sequence database. Relying on sequence hits that passed a threshold level of coverage and accuracy, serotype could be unambiguously predicted within 1 min for an average-size WGS sample of ∼500 MB. Validation with WGS data from 380 isolates showed an accuracy rate of 98.2%. This pipeline is the first step toward building a comprehensive WGS-based analysis pipeline of Shigella spp. in a field laboratory setting, where speed is essential and resources need to be more cost-effectively dedicated.IMPORTANCEShigella causes diarrheal disease with serious public health implications. However, conventional Shigella identification methods are laborious and time-consuming and can be erroneous due to the high similarity between Shigella and enteroinvasive Escherichia coli (EIEC) and cross-reactivity between serotyping antisera. Further, serotype interpretation is complicated for inexperienced users. To develop an easier method with higher accuracy based on whole-genome sequencing (WGS) for Shigella serotyping, we systematically examined genomic information of Shigella isolates from 53 serotypes to define rules for differentiation and serotyping. We created ShigaTyper, an automated pipeline that accurately and rapidly excludes non-Shigella isolates and identifies 59 Shigella serotypes using Illumina paired-end WGS reads. A serotype can be unambiguously predicted at a data processing speed of 538 MB/min with 98.2% accuracy from a regular laptop. Once it is installed, training in bioinformatics analysis and Shigella genetics is not required. This pipeline is particularly useful to general microbiologists in field laboratories.

SUBMITTER: Wu Y

PROVIDER: S-EPMC6585509 | biostudies-literature | 2019 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

In Silico Serotyping Based on Whole-Genome Sequencing Improves the Accuracy of Shigella Identification.

Wu Yun Y Lau Henry K HK Lee Teresa T Lau David K DK Payne Justin J

Applied and environmental microbiology 20190322 7

Bacteria of the genus Shigella, consisting of 4 species and >50 serotypes, cause shigellosis, a foodborne disease of significant morbidity, mortality, and economic loss worldwide. Classical Shigella identification based on selective media and serology is tedious, time-consuming, expensive, and not always accurate. A molecular diagnostic assay does not distinguish Shigella at the species level or from enteroinvasive Escherichia coli (EIEC). We inspected genomic sequenc ...[more]

PMID: 30709819

Similar Datasets

Project description:We compared the performance of four open-source in silico Salmonella typing tools (SeqSero, SeqSero2, Salmonella In Silico Typing Resource [SISTR], and Metric Oriented Sequence Typer [MOST]) to assess their potential for replacing laboratory serological testing with serovar predictions from whole-genome sequencing data. We conducted a retrospective analysis of 1,624 Salmonella isolates of 72 serovars submitted to the German National Salmonella Reference Laboratory between 1999 and 2019. All isolates are derived from animal and foodstuff origins. We conducted Illumina short-read sequencing and compared the in silico serovar prediction results with the results of routine laboratory serotyping. We found the best-performing in silico serovar prediction tool to be SISTR, with 94% correctly typed isolates, followed by SeqSero2 (87%), SeqSero (81%), and MOST (79%). Furthermore, we found that mapping-based tools like SeqSero and SeqSero2 (allele mode) were more reliable for the prediction of monophasic variants, while sequence type and cluster-based methods like MOST and SISTR (core-genome multilocus sequence type [cgMLST]), showed greater resilience when confronted with GC-biased sequencing data. We showed that the choice of library preparation kit could substantially affect O antigen detection, due to the low GC content of the wzx and wzy genes. Although the accuracy of computational serovar predictions is still not quite on par with traditional serotyping by Salmonella reference laboratories, the command-line tools investigated in this study perform a rapid, efficient, inexpensive, and reproducible analysis, which can be integrated into in-house characterization pipelines. Based on our results, we find SISTR most suitable for automated, routine serotyping for public health surveillance of SalmonellaIMPORTANCESalmonella spp. are important foodborne pathogens. To reduce the number of infected patients, it is essential to understand which subtypes of the bacteria cause disease outbreaks. Traditionally, characterization of Salmonella requires serological testing, a laboratory method by which Salmonella isolates can be classified into over 2,600 distinct subtypes, called serovars. Due to recent advances in whole-genome sequencing, many tools have been developed to replace traditional testing methods with computational analysis of genome sequences. It is crucial to validate that these tools, many already in use for routine surveillance, deliver accurate and reliable serovar information. In this study, we set out to compare which of the currently available open-source command-line tools is most suitable to replace serological testing. A thorough evaluation of the differing computational approaches is highly important to ensure the backward compatibility of serotyping data and to maintain comparability between laboratories.

Project description:Until recently, traditional serology and the Kauffmann White Scheme (KWS) have been the gold standard for Salmonella serotyping. Whole Genome Sequencing (WGS) has now emerged as an alternative in this field. Serotype information remains a cornerstone in food safety and public health activities to reduce the burden of salmonellosis. At the same time, recent advances in WGS have improved the ability to perform advanced pathogen characterization while improving trace back investigations to determine the source of foodborne illness during outbreaks. Serovar prediction based on WGS can be performed using in silico data analysis tools. Three such tools have been developed: (a). Salmonella in silico Typing Resource (SISTR), (b). SeqSero, and (c). in silico 7-gene MLST ST (Multilocus Sequence Typing Sub-Typing) which was generated using the SISTR platform. Public health officials around the world are diligently working to validate these tools for replacing traditional surveillance methods to provide a more powerful approach for molecular epidemiology in support of public health investigations. In this study, we report a retrospective analysis of our laboratory inventory of 1,041 Salmonella isolates collected between 1999 and 2017. These isolates are of public health significance since they all came from either food, feed or environmental swabs. They were all serotyped by both traditional serology and WGS using an in silico SeqSero tool for serovar prediction. Both predicted identical Salmonella serotypes in 899 isolates (86.4% of the 1,041 Salmonella isolates). SeqSero assignments differed from traditional serological testing in 80 isolates (7.7%) and no serotype prediction was ascertained from 62 isolates (5.9%). This retrospective study is an excellent example of using WGS and SeqSero as a data analysis tool to predict Salmonella serotypes that can provide numerous advantages including molecular and genetic details regarding the characteristics of the Salmonella isolates compared to traditional KWS serotyping. In conclusion, it is evident that using WGS and in silico tools for Salmonella serotyping might someday replace traditional serotyping.

Project description:Salmonella is one of the most common causes of food-borne diseases worldwide. While Salmonella molecular subtyping by Whole Genome Sequencing (WGS) is increasingly used for outbreak and source tracking investigations, serotyping remains as a first-line characterization of Salmonella isolates. The traditional phenotypic method for serotyping is logistically challenging, as it requires the use of more than 150 specific antisera and well trained personnel to interpret the results. Consequently, it is not a routine method for the majority of laboratories. Several rapid molecular methods targeting O and H loci or surrogate genomic markers have been developed as alternative solutions. With the expansion of WGS, in silico Salmonella serotype prediction using WGS data is available. Here, we compared a microarray method using molecular markers, the Check and Trace Salmonella assay (CTS) and a WGS-based serotype prediction tool that targets molecular determinants of serotype (SeqSero) to the traditional phenotypic method using 100 strains representing 45 common and uncommon serotypes. Compared to the traditional method, the CTS assay correctly serotyped 97% of the strains, four strains gave a double serotype prediction. Among the inconclusive data, one strain was not predicted and two strains were incorrectly identified. SeqSero was evaluated with two versions (SeqSero 1 and the alpha test version of SeqSero 2). The correct antigenic formula was predicted by SeqSero 1 for 96 and 95% of strains using raw reads and assembly, respectively. However, 34 and 33% of these predictions included multiple serotypes by raw reads and assembly. With raw reads, one strain was not identified and three strains were discordant with phenotypic serotyping result. With assembly, three strains were not predicted and two strains were incorrectly predicted. While still under development, SeqSero 2 maintained the accuracy of antigenic formula prediction at 98% and reduced multiple serotype prediction rate to 13%. One strain had no prediction and one strain was incorrectly predicted. Our study indicates that the CTS assay is a good alternative for routine laboratories as it is an easy to use method with a short turn-around-time. SeqSero is a reliable replacement for phenotypic serotyping if WGS is routinely implemented.

Dataset Information

In Silico Serotyping Based on Whole-Genome Sequencing Improves the Accuracy of Shigella Identification.

Publications

<i>In Silico</i> Serotyping Based on Whole-Genome Sequencing Improves the Accuracy of <i>Shigella</i> Identification.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets