In Silico Serotyping Based on Whole-Genome Sequencing Improves the Accuracy of Shigella Identification.
Ontology highlight
ABSTRACT: Bacteria of the genus Shigella, consisting of 4 species and >50 serotypes, cause shigellosis, a foodborne disease of significant morbidity, mortality, and economic loss worldwide. Classical Shigella identification based on selective media and serology is tedious, time-consuming, expensive, and not always accurate. A molecular diagnostic assay does not distinguish Shigella at the species level or from enteroinvasive Escherichia coli (EIEC). We inspected genomic sequences from 221 Shigella isolates and observed low concordance rates between conventional designation and molecular serotyping: 86.4% and 80.5% at the species and serotype levels, respectively. Serotype determinants for 6 additional serotypes were identified. Examination of differentiation gene markers commonly perceived as characteristic hallmarks in Shigella showed high variability among different serotypes. Using this information, we developed ShigaTyper, an automated workflow that utilizes limited computational resources to accurately and rapidly determine 59 Shigella serotypes using Illumina paired-end whole-genome sequencing (WGS) reads. Shigella serotype determinants and species-specific diagnostic markers were first identified through read alignment to an in-house curated reference sequence database. Relying on sequence hits that passed a threshold level of coverage and accuracy, serotype could be unambiguously predicted within 1?min for an average-size WGS sample of ?500 MB. Validation with WGS data from 380 isolates showed an accuracy rate of 98.2%. This pipeline is the first step toward building a comprehensive WGS-based analysis pipeline of Shigella spp. in a field laboratory setting, where speed is essential and resources need to be more cost-effectively dedicated.IMPORTANCE Shigella causes diarrheal disease with serious public health implications. However, conventional Shigella identification methods are laborious and time-consuming and can be erroneous due to the high similarity between Shigella and enteroinvasive Escherichia coli (EIEC) and cross-reactivity between serotyping antisera. Further, serotype interpretation is complicated for inexperienced users. To develop an easier method with higher accuracy based on whole-genome sequencing (WGS) for Shigella serotyping, we systematically examined genomic information of Shigella isolates from 53 serotypes to define rules for differentiation and serotyping. We created ShigaTyper, an automated pipeline that accurately and rapidly excludes non-Shigella isolates and identifies 59 Shigella serotypes using Illumina paired-end WGS reads. A serotype can be unambiguously predicted at a data processing speed of 538 MB/min with 98.2% accuracy from a regular laptop. Once it is installed, training in bioinformatics analysis and Shigella genetics is not required. This pipeline is particularly useful to general microbiologists in field laboratories.
SUBMITTER: Wu Y
PROVIDER: S-EPMC6585509 | biostudies-literature | 2019 Apr
REPOSITORIES: biostudies-literature
ACCESS DATA