Unknown

Dataset Information

0

CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning.


ABSTRACT: The fast accumulation of viral metagenomic data has contributed significantly to new RNA virus discovery. However, the short read size, complex composition, and large data size can all make taxonomic analysis difficult. In particular, commonly used alignment-based methods are not ideal choices for detecting new viral species. In this work, we present a novel hierarchical classification model named CHEER, which can conduct read-level taxonomic classification from order to genus for new species. By combining k-mer embedding-based encoding, hierarchically organized CNNs, and carefully trained rejection layer, CHEER is able to assign correct taxonomic labels for reads from new species. We tested CHEER on both simulated and real sequencing data. The results show that CHEER can achieve higher accuracy than popular alignment-based and alignment-free taxonomic assignment tools. The source code, scripts, and pre-trained parameters for CHEER are available via GitHub:https://github.com/KennthShang/CHEER.

SUBMITTER: Shang J 

PROVIDER: S-EPMC7255349 | biostudies-literature | 2020 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning.

Shang Jiayu J   Sun Yanni Y  

Methods (San Diego, Calif.) 20200523


The fast accumulation of viral metagenomic data has contributed significantly to new RNA virus discovery. However, the short read size, complex composition, and large data size can all make taxonomic analysis difficult. In particular, commonly used alignment-based methods are not ideal choices for detecting new viral species. In this work, we present a novel hierarchical classification model named CHEER, which can conduct read-level taxonomic classification from order to genus for new species. B  ...[more]

Similar Datasets

| S-EPMC6069770 | biostudies-literature
| S-EPMC7671387 | biostudies-literature
| S-EPMC8172088 | biostudies-literature
| EMPIAR-10069 | biostudies-other
| S-EPMC3152360 | biostudies-literature
| S-EPMC8665375 | biostudies-literature
| S-EPMC8594867 | biostudies-literature
| S-EPMC7551840 | biostudies-literature
| S-EPMC5389551 | biostudies-literature
| S-EPMC8266618 | biostudies-literature