Dataset Information

Estimating haplotype frequency and coverage of databases.

ABSTRACT: A variety of forensic, population, and disease studies are based on haploid DNA (e.g. mitochondrial DNA or Y-chromosome data). For any set of genetic markers databases of conventional size will normally contain only a fraction of all haplotypes. For several applications, reliable estimates of haplotype frequencies, the total number of haplotypes and coverage of the database (the probability that the next random haplotype is contained in the database) will be useful. We propose different approaches to the problem based on classical methods as well as new applications of Principal Component Analysis (PCA). We also discuss previous proposals based on saturation curves. Several conclusions can be inferred from simulated and real data. First, classical estimates of the fraction of unseen haplotypes can be seriously biased. Second, there is no obvious way to decide on required sample size based on traditional approaches. Methods based on testing of hypotheses or length of confidence intervals may appear artificial since no single test or parameter stands out as particularly relevant. Rather the coverage may be more relevant since it indicates the percentage of different haplotypes that are contained in a database; if the coverage is low, there is a considerable chance that the next haplotype to be observed does not appear in the database and this indicates that the database needs to be expanded. Finally, freeware and example data sets accompany the methods discussed in this paper: http://folk.uio.no/thoree/nhap/.

SUBMITTER: Egeland T

PROVIDER: S-EPMC2602601 | biostudies-literature | 2008

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Estimating haplotype frequency and coverage of databases.

Egeland Thore T Salas Antonio A

PloS one 20081222 12

A variety of forensic, population, and disease studies are based on haploid DNA (e.g. mitochondrial DNA or Y-chromosome data). For any set of genetic markers databases of conventional size will normally contain only a fraction of all haplotypes. For several applications, reliable estimates of haplotype frequencies, the total number of haplotypes and coverage of the database (the probability that the next random haplotype is contained in the database) will be useful. We propose different approach ...[more]

PMID: 19098988

Dataset Information

Estimating haplotype frequency and coverage of databases.

Publications

Estimating haplotype frequency and coverage of databases.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Methods for estimating human endogenous retrovirus activities from EST databases.
| S-EPMC1892069 | biostudies-literature

Estimating HLA haplotype frequencies from homozygous individuals - A Technical Report.
| S-EPMC9131737 | biostudies-literature

CSHAP: efficient haplotype frequency estimation based on sparse representation.
| S-EPMC6931353 | biostudies-literature

Estimating the coverage of mental health programmes: a systematic review.
| S-EPMC3997372 | biostudies-literature

Estimating coverage in metagenomic data sets and why it matters.
| S-EPMC4992084 | biostudies-literature

Estimating the effects of prescription drug coverage for Medicare beneficiaries.
| S-EPMC1955253 | biostudies-literature

The effect of single-nucleotide polymorphism marker selection on patterns of haplotype blocks and haplotype frequency estimates.
| S-EPMC1285181 | biostudies-literature

The frequency of an IL-18-associated haplotype in Africans.
| S-EPMC3598319 | biostudies-literature

MalHaploFreq: a computer programme for estimating malaria haplotype frequencies from blood samples.
| S-EPMC2490701 | biostudies-literature

Estimating DNA coverage and abundance in metagenomes using a gamma approximation.
| S-EPMC2815663 | biostudies-literature