Ontology highlight
ABSTRACT:
SUBMITTER: Amgarten D
PROVIDER: S-EPMC6090037 | biostudies-literature | 2018
REPOSITORIES: biostudies-literature
Amgarten Deyvid D Braga Lucas P P LPP da Silva Aline M AM Setubal João C JC
Frontiers in genetics 20180807
Here we present MARVEL, a tool for prediction of double-stranded DNA bacteriophage sequences in metagenomic bins. MARVEL uses a random forest machine learning approach. We trained the program on a dataset with 1,247 phage and 1,029 bacterial genomes, and tested it on a dataset with 335 bacterial and 177 phage genomes. We show that three simple genomic features extracted from contig sequences were sufficient to achieve a good performance in separating bacterial from phage sequences: gene density, ...[more]