Unknown

Dataset Information

0

PyroHMMvar: a sensitive and accurate method to call short indels and SNPs for Ion Torrent and 454 data.


ABSTRACT: The identification of short insertions and deletions (indels) and single nucleotide polymorphisms (SNPs) from Ion Torrent and 454 reads is a challenging problem, essentially because these techniques are prone to sequence erroneously at homopolymers and can, therefore, raise indels in reads. Most of the existing mapping programs do not model homopolymer errors when aligning reads against the reference. The resulting alignments will then contain various kinds of mismatches and indels that confound the accurate determination of variant loci and alleles.To address these challenges, we realign reads against the reference using our previously proposed hidden Markov model that models homopolymer errors and then merges these pairwise alignments into a weighted alignment graph. Based on our weighted alignment graph and hidden Markov model, we develop a method called PyroHMMvar, which can simultaneously detect short indels and SNPs, as demonstrated in human resequencing data. Specifically, by applying our methods to simulated diploid datasets, we demonstrate that PyroHMMvar produces more accurate results than state-of-the-art methods, such as Samtools and GATK, and is less sensitive to mapping parameter settings than the other methods. We also apply PyroHMMvar to analyze one human whole genome resequencing dataset, and the results confirm that PyroHMMvar predicts SNPs and indels accurately.Source code freely available at the following URL: https://code.google.com/p/pyrohmmvar/, implemented in C++ and supported on Linux. .

SUBMITTER: Zeng F 

PROVIDER: S-EPMC3888126 | biostudies-literature | 2013 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

PyroHMMvar: a sensitive and accurate method to call short indels and SNPs for Ion Torrent and 454 data.

Zeng Feng F   Jiang Rui R   Chen Ting T  

Bioinformatics (Oxford, England) 20130831 22


<h4>Motivation</h4>The identification of short insertions and deletions (indels) and single nucleotide polymorphisms (SNPs) from Ion Torrent and 454 reads is a challenging problem, essentially because these techniques are prone to sequence erroneously at homopolymers and can, therefore, raise indels in reads. Most of the existing mapping programs do not model homopolymer errors when aligning reads against the reference. The resulting alignments will then contain various kinds of mismatches and i  ...[more]

Similar Datasets

| S-EPMC3711422 | biostudies-literature
| S-EPMC2442091 | biostudies-literature
| S-EPMC3926339 | biostudies-literature
| S-EPMC4748696 | biostudies-literature
| PRJEB11475 | ENA
| S-EPMC5319672 | biostudies-literature
| S-EPMC6330128 | biostudies-literature
| PRJNA172889 | ENA
| PRJNA172891 | ENA
| PRJEB5854 | ENA