Unknown

Dataset Information

0

MetaBinG2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms.


ABSTRACT: BACKGROUND:Many methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms. A large portion of sequencing sequences may be classified as unknown, which greatly impairs our understanding of the whole sample. RESULT:Here we present MetaBinG2, a fast method for metagenomic sequence classification, especially for samples with a large number of unknown organisms. MetaBinG2 is based on sequence composition, and uses GPUs to accelerate its speed. A million 100 bp Illumina sequences can be classified in about 1 min on a computer with one GPU card. We evaluated MetaBinG2 by comparing it to multiple popular existing methods. We then applied MetaBinG2 to the dataset of MetaSUB Inter-City Challenge provided by CAMDA data analysis contest and compared community composition structures for environmental samples from different public places across cities. CONCLUSION:Compared to existing methods, MetaBinG2 is fast and accurate, especially for those samples with significant proportions of unknown organisms. REVIEWERS:This article was reviewed by Drs. Eran Elhaik, Nicolas Rascovan, and Serghei Mangul.

SUBMITTER: Qiao Y 

PROVIDER: S-EPMC6104016 | biostudies-literature | 2018 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

MetaBinG2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms.

Qiao Yuyang Y   Jia Ben B   Hu Zhiqiang Z   Sun Chen C   Xiang Yijin Y   Wei Chaochun C  

Biology direct 20180822 1


<h4>Background</h4>Many methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms. A large portion of sequencing sequences may be classified as unknown, which greatly impairs our understanding of the whole sample.<h4>Result</h4>Here we present MetaBinG2, a fast method for metagenomic sequence classification, especially for samples with a large number of unknown organisms. MetaBinG2 is based on sequence composit  ...[more]

Similar Datasets

| S-EPMC4428112 | biostudies-literature
| S-EPMC5389551 | biostudies-literature
| S-EPMC4632058 | biostudies-literature
| S-EPMC3953531 | biostudies-literature
| S-EPMC4315456 | biostudies-literature
| S-EPMC4053813 | biostudies-other
| S-EPMC3319535 | biostudies-literature
| S-EPMC3549735 | biostudies-literature
| S-EPMC3424124 | biostudies-literature
| S-EPMC2957682 | biostudies-literature