Unknown

Dataset Information

0

Classification of metagenomics data at lower taxonomic level using a robust supervised classifier.


ABSTRACT: As more and more completely sequenced genomes become available, the taxonomic classification of metagenomic data will benefit greatly from supervised classifiers that can be updated instantaneously in response to new genomes. Currently, some supervised classifiers have been developed to assess the organism of metagenomic sequences. We have found that the existing supervised classifiers usually cannot discriminate the training data from different classes accurately when the data contain some outliers. However, the training genomic data (bacterial and archaeal genomes) usually contain a portion of outliers, which come from sequencing errors, phage invasions, and some highly expressed genes, etc. The outliers, treated as noises, prohibit the development of classifiers with better prediction accuracy. To solve the problem, we present a robust supervised classifier, weighted support vector domain description (WSVDD), which can eliminate the interference from some outliers for training genomic data and then generate more accurate data domain descriptions for each taxonomic class. The experimental results demonstrate WSVDD is more robust than other classifiers for simulated Sanger and 454 reads with different outlier rates. In addition, in experiments performed on simulated metagenomes and real gut metagenomes, WSVDD also achieved better prediction accuracy than other classifiers.

SUBMITTER: Hou T 

PROVIDER: S-EPMC4309676 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

altmetric image

Publications

Classification of metagenomics data at lower taxonomic level using a robust supervised classifier.

Hou Tao T   Liu Fu F   Liu Yun Y   Zou Qing Yu QY   Zhang Xiao X   Wang Ke K  

Evolutionary bioinformatics online 20150126


As more and more completely sequenced genomes become available, the taxonomic classification of metagenomic data will benefit greatly from supervised classifiers that can be updated instantaneously in response to new genomes. Currently, some supervised classifiers have been developed to assess the organism of metagenomic sequences. We have found that the existing supervised classifiers usually cannot discriminate the training data from different classes accurately when the data contain some outl  ...[more]

Similar Datasets

| S-EPMC11316826 | biostudies-literature
| S-EPMC4833860 | biostudies-other
| S-EPMC6716367 | biostudies-literature
| S-EPMC7671387 | biostudies-literature
| S-EPMC10769086 | biostudies-literature
| S-EPMC10794705 | biostudies-literature
| S-EPMC9492272 | biostudies-literature
| S-EPMC7498351 | biostudies-literature
| S-EPMC6264004 | biostudies-literature
| S-EPMC2770370 | biostudies-literature