Unknown

Dataset Information

0

Rapid discovery of novel prophages using biological feature engineering and machine learning.


ABSTRACT: Prophages are phages that are integrated into bacterial genomes and which are key to understanding many aspects of bacterial biology. Their extreme diversity means they are challenging to detect using sequence similarity, yet this remains the paradigm and thus many phages remain unidentified. We present a novel, fast and generalizing machine learning method based on feature space to facilitate novel prophage discovery. To validate the approach, we reanalyzed publicly available marine viromes and single-cell genomes using our feature-based approaches and found consistently more phages than were detected using current state-of-the-art tools while being notably faster. This demonstrates that our approach significantly enhances bacteriophage discovery and thus provides a new starting point for exploring new biologies.

SUBMITTER: Siren K 

PROVIDER: S-EPMC7787355 | biostudies-literature | 2021 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Rapid discovery of novel prophages using biological feature engineering and machine learning.

Sirén Kimmo K   Millard Andrew A   Petersen Bent B   Gilbert M Thomas P MTP   Clokie Martha R J MRJ   Sicheritz-Pontén Thomas T  

NAR genomics and bioinformatics 20210106 1


Prophages are phages that are integrated into bacterial genomes and which are key to understanding many aspects of bacterial biology. Their extreme diversity means they are challenging to detect using sequence similarity, yet this remains the paradigm and thus many phages remain unidentified. We present a novel, fast and generalizing machine learning method based on feature space to facilitate novel prophage discovery. To validate the approach, we reanalyzed publicly available marine viromes and  ...[more]

Similar Datasets

| S-EPMC4482677 | biostudies-literature
| S-EPMC6315676 | biostudies-literature
| S-EPMC7435601 | biostudies-literature
| S-EPMC7251299 | biostudies-literature
| S-EPMC8580432 | biostudies-literature
| S-EPMC6690680 | biostudies-literature
| S-EPMC10121677 | biostudies-literature
| S-EPMC7189237 | biostudies-literature
| S-EPMC4909287 | biostudies-literature
| S-EPMC7660369 | biostudies-literature