Dataset Information

Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies.

ABSTRACT: Next-generation sequencing technologies can be used to analyse genetically heterogeneous samples at unprecedented detail. The high coverage achievable with these methods enables the detection of many low-frequency variants. However, sequencing errors complicate the analysis of mixed populations and result in inflated estimates of genetic diversity. We developed a probabilistic Bayesian approach to minimize the effect of errors on the detection of minority variants. We applied it to pyrosequencing data obtained from a 1.5-kb-fragment of the HIV-1 gag/pol gene in two control and two clinical samples. The effect of PCR amplification was analysed. Error correction resulted in a two- and five-fold decrease of the pyrosequencing base substitution rate, from 0.05% to 0.03% and from 0.25% to 0.05% in the non-PCR and PCR-amplified samples, respectively. We were able to detect viral clones as rare as 0.1% with perfect sequence reconstruction. Probabilistic haplotype inference outperforms the counting-based calling method in both precision and recall. Genetic diversity observed within and between two clinical samples resulted in various patterns of phenotypic drug resistance and suggests a close epidemiological link. We conclude that pyrosequencing can be used to investigate genetically diverse samples with high accuracy if technical errors are properly treated.

SUBMITTER: Zagordi O

PROVIDER: S-EPMC2995073 | biostudies-literature | 2010 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies.

Zagordi Osvaldo O Klein Rolf R Däumer Martin M Beerenwinkel Niko N

Nucleic acids research 20100729 21

Next-generation sequencing technologies can be used to analyse genetically heterogeneous samples at unprecedented detail. The high coverage achievable with these methods enables the detection of many low-frequency variants. However, sequencing errors complicate the analysis of mixed populations and result in inflated estimates of genetic diversity. We developed a probabilistic Bayesian approach to minimize the effect of errors on the detection of minority variants. We applied it to pyrosequencin ...[more]

PMID: 20671025

Dataset Information

Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies.

Publications

Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Benchmarking of computational error-correction methods for next-generation sequencing data.
| S-EPMC7079412 | biostudies-literature

Efficient error correction for next-generation sequencing of viral amplicons.
| S-EPMC3382444 | biostudies-literature

A systematic comparison of error correction enzymes by next-generation sequencing.
| S-EPMC5587813 | biostudies-literature

Analysis of error profiles in deep next-generation sequencing data.
| S-EPMC6417284 | biostudies-literature

Author Correction: Ultrasensitive amplicon barcoding for next-generation sequencing facilitating sequence error and amplification-bias correction.
| S-EPMC7538963 | biostudies-literature

Ultrasensitive amplicon barcoding for next-generation sequencing facilitating sequence error and amplification-bias correction.
| S-EPMC7324614 | biostudies-literature

ConPADE: genome assembly ploidy estimation from next-generation sequencing data.
| S-EPMC4400156 | biostudies-literature

Estimation of allele frequency and association mapping using next-generation sequencing data.
| S-EPMC3212839 | biostudies-other

NucVoter: A Voting Algorithm for Reliable Nucleosome Prediction Using Next-Generation Sequencing Data.
| S-EPMC4393064 | biostudies-literature

Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing.
| S-EPMC3022557 | biostudies-literature