Unknown

Dataset Information

0

Variable-order sequence modeling improves bacterial strain discrimination for Ion Torrent DNA reads.


ABSTRACT: Genome sequencing provides a powerful tool for pathogen detection and can help resolve outbreaks that pose public safety and health risks. Mapping of DNA reads to genomes plays a fundamental role in this approach, where accurate alignment and classification of sequencing data is crucial. Standard mapping methods crudely treat bases as independent from their neighbors. Accuracy might be improved by using higher order paired hidden Markov models (HMMs), which model neighbor effects, but introduce design and implementation issues that have typically made them impractical for read mapping applications. We present a variable-order paired HMM that we term VarHMM, which addresses central issues involved with higher order modeling for sequence alignment.Compared with existing alignment methods, VarHMM is able to model higher order distributions and quantify alignment probabilities with greater detail and accuracy. In a series of comparison tests, in which Ion Torrent sequenced DNA was mapped to similar bacterial strains, VarHMM consistently provided better strain discrimination than any of the other alignment methods that we compared with.Our results demonstrate the advantages of higher ordered probability distribution modeling and also suggest that further development of such models would benefit read mapping in a range of other applications as well.

SUBMITTER: Poulsen TM 

PROVIDER: S-EPMC5469136 | biostudies-literature | 2017 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Variable-order sequence modeling improves bacterial strain discrimination for Ion Torrent DNA reads.

Poulsen Thomas M TM   Frith Martin M  

BMC bioinformatics 20170612 1


<h4>Background</h4>Genome sequencing provides a powerful tool for pathogen detection and can help resolve outbreaks that pose public safety and health risks. Mapping of DNA reads to genomes plays a fundamental role in this approach, where accurate alignment and classification of sequencing data is crucial. Standard mapping methods crudely treat bases as independent from their neighbors. Accuracy might be improved by using higher order paired hidden Markov models (HMMs), which model neighbor effe  ...[more]

Similar Datasets

| PRJEB11475 | ENA
| S-EPMC8599889 | biostudies-literature
| S-EPMC3411582 | biostudies-literature
| PRJNA172890 | ENA
| PRJNA172889 | ENA
| PRJEB5854 | ENA
| PRJNA172891 | ENA
| PRJEB5855 | ENA
| S-EPMC4249215 | biostudies-literature
| S-EPMC7240266 | biostudies-literature