Unknown

Dataset Information

0

Mapping short DNA sequencing reads and calling variants using mapping quality scores.


ABSTRACT: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

SUBMITTER: Li H 

PROVIDER: S-EPMC2577856 | biostudies-literature | 2008 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Mapping short DNA sequencing reads and calling variants using mapping quality scores.

Li Heng H   Ruan Jue J   Durbin Richard R  

Genome research 20080819 11


New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actu  ...[more]

Similar Datasets

| S-EPMC2335322 | biostudies-literature
| S-EPMC9122538 | biostudies-literature
| S-EPMC9237988 | biostudies-literature
| S-EPMC8693577 | biostudies-literature
| S-EPMC3493122 | biostudies-literature
| S-EPMC8551035 | biostudies-literature
| S-EPMC3534400 | biostudies-literature
| S-EPMC4021105 | biostudies-literature
| S-EPMC2752623 | biostudies-literature
| S-EPMC7842384 | biostudies-literature