Unknown

Dataset Information

0

Biases in genome reconstruction from metagenomic data.


ABSTRACT: Background:Advances in sequencing, assembly, and assortment of contigs into species-specific bins has enabled the reconstruction of genomes from metagenomic data (MAGs). Though a powerful technique, it is difficult to determine whether assembly and binning techniques are accurate when applied to environmental metagenomes due to a lack of complete reference genome sequences against which to check the resulting MAGs. Methods:We compared MAGs derived from an enrichment culture containing ~20 organisms to complete genome sequences of 10 organisms isolated from the enrichment culture. Factors commonly considered in binning software-nucleotide composition and sequence repetitiveness-were calculated for both the correctly binned and not-binned regions. This direct comparison revealed biases in sequence characteristics and gene content in the not-binned regions. Additionally, the composition of three public data sets representing MAGs reconstructed from the Tara Oceans metagenomic data was compared to a set of representative genomes available through NCBI RefSeq to verify that the biases identified were observable in more complex data sets and using three contemporary binning software packages. Results:Repeat sequences were frequently not binned in the genome reconstruction processes, as were sequence regions with variant nucleotide composition. Genes encoded on the not-binned regions were strongly biased towards ribosomal RNAs, transfer RNAs, mobile element functions and genes of unknown function. Our results support genome reconstruction as a robust process and suggest that reconstructions determined to be >90% complete are likely to effectively represent organismal function; however, population-level genotypic heterogeneity in natural populations, such as uneven distribution of plasmids, can lead to incorrect inferences.

SUBMITTER: Nelson WC 

PROVIDER: S-EPMC7605220 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

altmetric image

Publications

Biases in genome reconstruction from metagenomic data.

Nelson William C WC   Tully Benjamin J BJ   Mobberley Jennifer M JM  

PeerJ 20201030


<h4>Background</h4>Advances in sequencing, assembly, and assortment of contigs into species-specific bins has enabled the reconstruction of genomes from metagenomic data (MAGs). Though a powerful technique, it is difficult to determine whether assembly and binning techniques are accurate when applied to environmental metagenomes due to a lack of complete reference genome sequences against which to check the resulting MAGs.<h4>Methods</h4>We compared MAGs derived from an enrichment culture contai  ...[more]

Similar Datasets

| S-EPMC3384625 | biostudies-literature
| S-EPMC4771326 | biostudies-literature
| S-EPMC3664793 | biostudies-literature
| S-EPMC3374609 | biostudies-literature
| S-EPMC6549370 | biostudies-literature
| S-EPMC10928208 | biostudies-literature
| S-EPMC8504635 | biostudies-literature
| S-EPMC6138922 | biostudies-literature
| S-EPMC5887522 | biostudies-literature
| S-EPMC6936146 | biostudies-literature