Unknown

Dataset Information

0

Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks.


ABSTRACT:

Background

Whole genome sequencing (WGS) has gained increasing importance in responses to enteric bacterial outbreaks. Common analysis procedures for WGS, single nucleotide polymorphisms (SNPs) and genome assembly, are highly dependent upon WGS data quality.

Methods

Raw, unprocessed WGS reads from Escherichia coli, Salmonella enterica, and Shigella sonnei outbreak clusters were characterized for four quality metrics: PHRED score, read length, library insert size, and ambiguous nucleotide composition. PHRED scores were strongly correlated with improved SNPs analysis results in E. coli and S. enterica clusters.

Results

Assembly quality showed only moderate correlations with PHRED scores and library insert size, and then only for Salmonella. To improve SNP analyses and assemblies, we compared seven read-healing pipelines to improve these four quality metrics and to see how well they improved SNP analysis and genome assembly. The most effective read healing pipelines for SNPs analysis incorporated quality-based trimming, fixed-width trimming, or both. The Lyve-SET SNPs pipeline showed a more marked improvement than the CFSAN SNP Pipeline, but the latter performed better on raw, unhealed reads. For genome assembly, SPAdes enabled significant improvements in healed E. coli reads only, while Skesa yielded no significant improvements on healed reads.

Conclusions

PHRED scores will continue to be a crucial quality metric albeit not of equal impact across all types of analyses for all enteric bacteria. While trimming-based read healing performed well for SNPs analyses, different read healing approaches are likely needed for genome assembly or other, emerging WGS analysis methodologies.

SUBMITTER: Wagner DD 

PROVIDER: S-EPMC8627651 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

altmetric image

Publications

Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks.

Wagner Darlene D DD   Carleton Heather A HA   Trees Eija E   Katz Lee S LS  

PeerJ 20211125


<h4>Background</h4>Whole genome sequencing (WGS) has gained increasing importance in responses to enteric bacterial outbreaks. Common analysis procedures for WGS, single nucleotide polymorphisms (SNPs) and genome assembly, are highly dependent upon WGS data quality.<h4>Methods</h4>Raw, unprocessed WGS reads from <i>Escherichia coli</i>, <i>Salmonella enterica</i>, and <i>Shigella sonnei</i> outbreak clusters were characterized for four quality metrics: PHRED score, read length, library insert si  ...[more]

Similar Datasets

| S-EPMC8317677 | biostudies-literature
| S-EPMC10752928 | biostudies-literature
| S-EPMC7285907 | biostudies-literature
| S-EPMC3556524 | biostudies-literature
| S-EPMC9746130 | biostudies-literature
| S-EPMC3626608 | biostudies-literature
| S-EPMC4767972 | biostudies-literature
| S-EPMC6478655 | biostudies-literature
| S-EPMC10755156 | biostudies-literature
| S-EPMC4928038 | biostudies-literature