Dataset Information

Amino acid based de Bruijn graph algorithm for identifying complete coding genes from metagenomic and metatranscriptomic short reads.

ABSTRACT: Metagenomic studies, greatly promoted by the fast development of next-generation sequencing (NGS) technologies, uncover complex structures of microbial communities and their interactions with environment. As the majority of microbes lack information of genome sequences, it is essential to assemble prokaryotic genomes ab initio aiming to retrieve complete coding genes from various metabolic pathways. The complex nature of microbial composition and the burden of handling a vast amount of metagenomic data, bring great challenges to the development of effective and efficient bioinformatic tools. Here we present a protein assembler (MetaPA), based on de Bruijn graph searching on oligopeptide spaces and can be applied on both metagenomic and metatranscriptomic sequencing data. When public homologous protein sequences are involved to guide the assembling procedures, MetaPA assembles 85% of total proteins in complete sequences with high precision of 83% on real high-throughput sequencing datasets. Application of MetaPA on metatranscriptomic data successfully identifies the majority of actively transcribed genes validated in related studies. The results suggest that MetaPA has a good potential in both metagenomic and metatranscriptomic studies to characterize the composition and abundance of microbiota.

SUBMITTER: Liu J

PROVIDER: S-EPMC6412133 | biostudies-literature | 2019 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Amino acid based de Bruijn graph algorithm for identifying complete coding genes from metagenomic and metatranscriptomic short reads.

Liu Jiemeng J Lian Qichao Q Chen Yamao Y Qi Ji J

Nucleic acids research 20190301 5

Metagenomic studies, greatly promoted by the fast development of next-generation sequencing (NGS) technologies, uncover complex structures of microbial communities and their interactions with environment. As the majority of microbes lack information of genome sequences, it is essential to assemble prokaryotic genomes ab initio aiming to retrieve complete coding genes from various metabolic pathways. The complex nature of microbial composition and the burden of handling a vast amount of metagenom ...[more]

PMID: 30657979

Dataset Information

Amino acid based de Bruijn graph algorithm for identifying complete coding genes from metagenomic and metatranscriptomic short reads.

Publications

Amino acid based de Bruijn graph algorithm for identifying complete coding genes from metagenomic and metatranscriptomic short reads.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly.
| S-EPMC5591975 | biostudies-literature

A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes.
| S-EPMC5114782 | biostudies-literature

MBG: Minimizer-based sparse de Bruijn Graph construction.
| S-EPMC8521641 | biostudies-literature

cloudSPAdes: assembly of synthetic long reads using de Bruijn graphs.
| S-EPMC6612831 | biostudies-literature

Assembly of long error-prone reads using de Bruijn graphs.
| S-EPMC5206522 | biostudies-literature

Graphite: painting genomes using a colored de Bruijn graph.
| S-EPMC11497850 | biostudies-literature

Evaluating de Bruijn graph assemblers on 454 transcriptomic data.
| S-EPMC3517413 | biostudies-literature

Identifying splicing regulatory elements with de Bruijn graphs.
| S-EPMC4253301 | biostudies-literature

Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis.
| S-EPMC4896364 | biostudies-literature

Pan-genome de Bruijn graph using the bidirectional FM-index.
| S-EPMC10605969 | biostudies-literature