Dataset Information

Estimating DNA coverage and abundance in metagenomes using a gamma approximation.

ABSTRACT:

Motivation

Shotgun sequencing generates large numbers of short DNA reads from either an isolated organism or, in the case of metagenomics projects, from the aggregate genome of a microbial community. These reads are then assembled based on overlapping sequences into larger, contiguous sequences (contigs). The feasibility of assembly and the coverage achieved (reads per nucleotide or distinct sequence of nucleotides) depend on several factors: the number of reads sequenced, the read length and the relative abundances of their source genomes in the microbial community. A low coverage suggests that most of the genomic DNA in the sample has not been sequenced, but it is often difficult to estimate either the extent of the uncaptured diversity or the amount of additional sequencing that would be most efficacious. In this work, we regard a metagenome as a population of DNA fragments (bins), each of which may be covered by one or more reads. We employ a gamma distribution to model this bin population due to its flexibility and ease of use. When a gamma approximation can be found that adequately fits the data, we may estimate the number of bins that were not sequenced and that could potentially be revealed by additional sequencing. We evaluated the performance of this model using simulated metagenomes and demonstrate its applicability on three recent metagenomic datasets.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Hooper SD

PROVIDER: S-EPMC2815663 | biostudies-literature | 2010 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Estimating DNA coverage and abundance in metagenomes using a gamma approximation.

Hooper Sean D SD Dalevi Daniel D Pati Amrita A Mavromatis Konstantinos K Ivanova Natalia N NN Kyrpides Nikos C NC

Bioinformatics (Oxford, England) 20091214 3

<h4>Motivation</h4>Shotgun sequencing generates large numbers of short DNA reads from either an isolated organism or, in the case of metagenomics projects, from the aggregate genome of a microbial community. These reads are then assembled based on overlapping sequences into larger, contiguous sequences (contigs). The feasibility of assembly and the coverage achieved (reads per nucleotide or distinct sequence of nucleotides) depend on several factors: the number of reads sequenced, the read lengt ...[more]

PMID: 20008478

Dataset Information

Estimating DNA coverage and abundance in metagenomes using a gamma approximation.

Motivation

Supplementary information

Publications

Estimating DNA coverage and abundance in metagenomes using a gamma approximation.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes.
| S-EPMC4424905 | biostudies-literature

Estimating species distribution and abundance in river networks using environmental DNA.
| S-EPMC6243290 | biostudies-literature

Prider: multiplexed primer design using linearly scaling approximation of set coverage.
| S-EPMC9097127 | biostudies-literature

Estimating Abundance of Siberian Roe Deer Using Fecal-DNA Capture-Mark-Recapture in Northeast China.
| S-EPMC7401656 | biostudies-literature

Abundance profiling of specific gene groups using precomputed gut metagenomes yields novel biological hypotheses.
| S-EPMC5407692 | biostudies-literature

HyLight: Strain aware assembly of low coverage metagenomes.
| S-EPMC11458758 | biostudies-literature

Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods.
| S-EPMC4380030 | biostudies-literature

Estimating Lion Abundance using N-mixture Models for Social Species.
| S-EPMC5082374 | biostudies-literature

Estimating family planning coverage from contraceptive prevalence using national household surveys.
| S-EPMC4642361 | biostudies-literature

Estimating species richness using environmental DNA.
| S-EPMC4972244 | biostudies-literature