Dataset Information

Tackling soil diversity with the assembly of large, complex metagenomes.

ABSTRACT: The large volumes of sequencing data required to sample deeply the microbial communities of complex environments pose new challenges to sequence analysis. De novo metagenomic assembly effectively reduces the total amount of data to be analyzed but requires substantial computational resources. We combine two preassembly filtering approaches--digital normalization and partitioning--to generate previously intractable large metagenome assemblies. Using a human-gut mock community dataset, we demonstrate that these methods result in assemblies nearly identical to assemblies from unprocessed data. We then assemble two large soil metagenomes totaling 398 billion bp (equivalent to 88,000 Escherichia coli genomes) from matched Iowa corn and native prairie soils. The resulting assembled contigs could be used to identify molecular interactions and reaction networks of known metabolic pathways using the Kyoto Encyclopedia of Genes and Genomes Orthology database. Nonetheless, more than 60% of predicted proteins in assemblies could not be annotated against known databases. Many of these unknown proteins were abundant in both corn and prairie soils, highlighting the benefits of assembly for the discovery and characterization of novelty in soil biodiversity. Moreover, 80% of the sequencing data could not be assembled because of low coverage, suggesting that considerably more sequencing data are needed to characterize the functional content of soil.

SUBMITTER: Howe AC

PROVIDER: S-EPMC3977251 | biostudies-literature | 2014 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Tackling soil diversity with the assembly of large, complex metagenomes.

Howe Adina Chuang AC Jansson Janet K JK Malfatti Stephanie A SA Tringe Susannah G SG Tiedje James M JM Brown C Titus CT

Proceedings of the National Academy of Sciences of the United States of America 20140314 13

The large volumes of sequencing data required to sample deeply the microbial communities of complex environments pose new challenges to sequence analysis. De novo metagenomic assembly effectively reduces the total amount of data to be analyzed but requires substantial computational resources. We combine two preassembly filtering approaches--digital normalization and partitioning--to generate previously intractable large metagenome assemblies. Using a human-gut mock community dataset, we demonstr ...[more]

PMID: 24632729

Similar Datasets

Project description:Soil metagenomics has been touted as the "grand challenge" for metagenomics, as the high microbial diversity and spatial heterogeneity of soils make them unamenable to current assembly platforms. Here, we aimed to improve soil metagenomic sequence assembly by applying the Moleculo synthetic long-read sequencing technology. In total, we obtained 267 Gbp of raw sequence data from a native prairie soil; these data included 109.7 Gbp of short-read data (~100 bp) from the Joint Genome Institute (JGI), an additional 87.7 Gbp of rapid-mode read data (~250 bp), plus 69.6 Gbp (>1.5 kbp) from Moleculo sequencing. The Moleculo data alone yielded over 5,600 reads of >10 kbp in length, and over 95% of the unassembled reads mapped to contigs of >1.5 kbp. Hybrid assembly of all data resulted in more than 10,000 contigs over 10 kbp in length. We mapped three replicate metatranscriptomes derived from the same parent soil to the Moleculo subassembly and found that 95% of the predicted genes, based on their assignments to Enzyme Commission (EC) numbers, were expressed. The Moleculo subassembly also enabled binning of >100 microbial genome bins. We obtained via direct binning the first complete genome, that of "Candidatus Pseudomonas sp. strain JKJ-1" from a native soil metagenome. By mapping metatranscriptome sequence reads back to the bins, we found that several bins corresponding to low-relative-abundance Acidobacteria were highly transcriptionally active, whereas bins corresponding to high-relative-abundance Verrucomicrobia were not. These results demonstrate that Moleculo sequencing provides a significant advance for resolving complex soil microbial communities. IMPORTANCE Soil microorganisms carry out key processes for life on our planet, including cycling of carbon and other nutrients and supporting growth of plants. However, there is poor molecular-level understanding of their functional roles in ecosystem stability and responses to environmental perturbations. This knowledge gap is largely due to the difficulty in culturing the majority of soil microbes. Thus, use of culture-independent approaches, such as metagenomics, promises the direct assessment of the functional potential of soil microbiomes. Soil is, however, a challenge for metagenomic assembly due to its high microbial diversity and variable evenness, resulting in low coverage and uneven sampling of microbial genomes. Despite increasingly large soil metagenome data volumes (>200 Gbp), the majority of the data do not assemble. Here, we used the cutting-edge approach of synthetic long-read sequencing technology (Moleculo) to assemble soil metagenome sequence data into long contigs and used the assemblies for binning of genomes. Author Video: An author video summary of this article is available.

Project description:Purpose: The chromosomal rearrangements of the mixed-lineage leukemia (MLL) gene have been extensively characterized as a potent oncogenic driver on the molecular and mechanistic level in acute lymphoblastic (ALL) and acute myeloid (AML) leukemia. For its oncogenic function the MLL fusion protein is hijacking the the multi enzyme super elongation complex (SEC) leading to elevated expression of MLL target genes (e.g. HOXA9 and MEIS). High expression of MLL target genes is overwriting the normal hematopoietic differentiation gene expression program, resulting in undifferentiated blasts cells having a more “stem-cell like” cancer-promoting phenotype. Although extensive resources have been devoted to a better understanding of therapeutic targets for the MLL fusion to overcome the de-differentiation, the inter-dependencies of those targets for the pathophysiology of MLL is still barely understood. Here we report a comparative mode of action analysis of different inhibitors potentially interfering with MLL fusion induced differentiation blockade. We used RNA-seq for transcriptomic profiling in 14 AML and ALL cell lines treated with DMSO control or one of the 5 studied inhibitors (EPZ-5676, Brequinar, BAY-155, BAY-1251152 and OTX015). Methods: We used RNA-seq for transcriptomic profiling in 14 AML and ALL cell lines treated with DMSO control or one of the 5 studied inhibitors (EPZ-5676, Brequinar, BAY-155, BAY-1251152 and OTX015). Results: We discovered significant differences between compounds in their ability to induce differentiation and interfere with MLL target genes expression. We observed that Menin and DOT1L inhibition act very specifically on MLL fused leukemia cell lines, whereas inhibition of BET, DHODH and P-TEFb have strong effects beyond the MLL fusion . Conclusions: These results show a substantial diversity in the molecular activities of those inhibitors and provide valuable insights into the further developmental potential as single agents or in combinations in MLL fused leukemias.

Dataset Information

Tackling soil diversity with the assembly of large, complex metagenomes.

Publications

Tackling soil diversity with the assembly of large, complex metagenomes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets