Unknown

Dataset Information

0

Metagenome fragment classification based on multiple motif-occurrence profiles.


ABSTRACT: A vast amount of metagenomic data has been obtained by extracting multiple genomes simultaneously from microbial communities, including genomes from uncultivable microbes. By analyzing these metagenomic data, novel microbes are discovered and new microbial functions are elucidated. The first step in analyzing these data is sequenced-read classification into reference genomes from which each read can be derived. The Naïve Bayes Classifier is a method for this classification. To identify the derivation of the reads, this method calculates a score based on the occurrence of a DNA sequence motif in each reference genome. However, large differences in the sizes of the reference genomes can bias the scoring of the reads. This bias might cause erroneous classification and decrease the classification accuracy. To address this issue, we have updated the Naïve Bayes Classifier method using multiple sets of occurrence profiles for each reference genome by normalizing the genome sizes, dividing each genome sequence into a set of subsequences of similar length and generating profiles for each subsequence. This multiple profile strategy improves the accuracy of the results generated by the Naïve Bayes Classifier method for simulated and Sargasso Sea datasets.

SUBMITTER: Matsushita N 

PROVIDER: S-EPMC4157293 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

Metagenome fragment classification based on multiple motif-occurrence profiles.

Matsushita Naoki N   Seno Shigeto S   Takenaka Yoichi Y   Matsuda Hideo H  

PeerJ 20140904


A vast amount of metagenomic data has been obtained by extracting multiple genomes simultaneously from microbial communities, including genomes from uncultivable microbes. By analyzing these metagenomic data, novel microbes are discovered and new microbial functions are elucidated. The first step in analyzing these data is sequenced-read classification into reference genomes from which each read can be derived. The Naïve Bayes Classifier is a method for this classification. To identify the deriv  ...[more]

Similar Datasets

| S-EPMC4769744 | biostudies-literature
| S-EPMC5023760 | biostudies-literature
| S-EPMC5181567 | biostudies-literature
| S-EPMC3125260 | biostudies-literature
| S-EPMC6581753 | biostudies-literature
| S-EPMC1160114 | biostudies-literature
| S-EPMC6854650 | biostudies-literature
| S-EPMC2951702 | biostudies-literature
| S-EPMC4889956 | biostudies-literature
| S-EPMC4508914 | biostudies-literature