Dataset Information

Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data.

ABSTRACT: DNA assembly is a core methodological step in metagenomic pipelines used to study the structure and function within microbial communities. Here we investigate the utility of Pacific Biosciences long and high accuracy circular consensus sequencing (CCS) reads for metagenomic projects. We compared the application and performance of both PacBio CCS and Illumina HiSeq data with assembly and taxonomic binning algorithms using metagenomic samples representing a complex microbial community. Eight SMRT cells produced approximately 94 Mb of CCS reads from a biogas reactor microbiome sample that averaged 1319 nt in length and 99.7% accuracy. CCS data assembly generated a comparative number of large contigs greater than 1 kb, to those assembled from a ~190x larger HiSeq dataset (~18 Gb) produced from the same sample (i.e approximately 62% of total contigs). Hybrid assemblies using PacBio CCS and HiSeq contigs produced improvements in assembly statistics, including an increase in the average contig length and number of large contigs. The incorporation of CCS data produced significant enhancements in taxonomic binning and genome reconstruction of two dominant phylotypes, which assembled and binned poorly using HiSeq data alone. Collectively these results illustrate the value of PacBio CCS reads in certain metagenomics applications.

SUBMITTER: Frank JA

PROVIDER: S-EPMC4860591 | biostudies-literature | 2016 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data.

Frank J A JA Pan Y Y Tooming-Klunderud A A Eijsink V G H VGH McHardy A C AC Nederbragt A J AJ Pope P B PB

Scientific reports 20160509

DNA assembly is a core methodological step in metagenomic pipelines used to study the structure and function within microbial communities. Here we investigate the utility of Pacific Biosciences long and high accuracy circular consensus sequencing (CCS) reads for metagenomic projects. We compared the application and performance of both PacBio CCS and Illumina HiSeq data with assembly and taxonomic binning algorithms using metagenomic samples representing a complex microbial community. Eight SMRT ...[more]

PMID: 27156482

Similar Datasets

Project description:BackgroundTravelers' diarrhea (TD) is often caused by enterotoxigenic Escherichia coli, enteroaggregative E. coli, other bacterial pathogens, Norovirus, and occasionally parasites. Nevertheless, standard diagnostic methods fail to identify pathogens in more than 40% of TD patients. It is predicted that new pathogens may be causative agents of the disease.ResultsWe performed a comprehensive amplicon and whole genome shotgun (WGS) metagenomic study of the fecal microbiomes from 23 TD patients and seven healthy travelers, all of which were negative for the known etiologic agents of TD based on standard microbiological and immunological assays. Abnormal and diverse taxonomic profiles in TD samples were revealed. WGS reads were assembled and the resulting contigs were visualized using multiple query types. A semi-manual workflow was applied to isolate independent genomes from metagenomic pools. A total of 565 genome bins were extracted, 320 of which were complete enough to be characterized as cellular genomes; 160 were viral genomes. We made predictions of the etiology of disease for many of the individual subjects based on the properties and features of the recovered genomes. Multiple patients with low-diversity metagenomes were predominated by one to several E. coli strains. Functional annotation allowed prediction of pathogenic type in many cases. Five patients were co-infected with E. coli and other members of Enterobacteriaceae, including Enterobacter, Klebsiella, and Citrobacter; these may represent blooms of organisms that appear following secretory diarrhea. New "dark matter" microbes were observed in multiple samples. In one, we identified a novel TM7 genome that phylogenetically clustered with a sludge isolate; it carries genes encoding potential virulence factors. In multiple samples, we observed high proportions of putative novel viral genomes, some of which form clusters with the ubiquitous gut virus, crAssphage. The total relative abundance of viruses was significantly higher in healthy travelers versus TD patients.ConclusionOur study highlights the strength of assembly-based metagenomics, especially the manually curated, visualization-assisted binning of contigs, in resolving unusual and under-characterized pathogenic profiles of human-associated microbiomes. Results show that TD may be polymicrobial, with multiple novel cellular and viral strains as potential players in the diarrheal disease.

Dataset Information

Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data.

Publications

Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets