Unknown

Dataset Information

0

PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data.


ABSTRACT: The reliable detection of novel bacterial pathogens from next-generation sequencing data is a key challenge for microbial diagnostics. Current computational tools usually rely on sequence similarity and often fail to detect novel species when closely related genomes are unavailable or missing from the reference database. Here we present the machine learning based approach PaPrBaG (Pathogenicity Prediction for Bacterial Genomes). PaPrBaG overcomes genetic divergence by training on a wide range of species with known pathogenicity phenotype. To that end we compiled a comprehensive list of pathogenic and non-pathogenic bacteria with human host, using various genome metadata in conjunction with a rule-based protocol. A detailed comparative study reveals that PaPrBaG has several advantages over sequence similarity approaches. Most importantly, it always provides a prediction whereas other approaches discard a large number of sequencing reads with low similarity to currently known reference genomes. Furthermore, PaPrBaG remains reliable even at very low genomic coverages. CombiningPaPrBaG with existing approaches further improves prediction results.

SUBMITTER: Deneke C 

PROVIDER: S-EPMC5209729 | biostudies-literature | 2017 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data.

Deneke Carlus C   Rentzsch Robert R   Renard Bernhard Y BY  

Scientific reports 20170104


The reliable detection of novel bacterial pathogens from next-generation sequencing data is a key challenge for microbial diagnostics. Current computational tools usually rely on sequence similarity and often fail to detect novel species when closely related genomes are unavailable or missing from the reference database. Here we present the machine learning based approach PaPrBaG (Pathogenicity Prediction for Bacterial Genomes). PaPrBaG overcomes genetic divergence by training on a wide range of  ...[more]

Similar Datasets

| S-EPMC10871075 | biostudies-literature
| S-EPMC7591033 | biostudies-literature
| S-EPMC2585161 | biostudies-literature
| S-EPMC8336856 | biostudies-literature
| S-EPMC10703010 | biostudies-literature
| S-EPMC3704944 | biostudies-literature
| S-EPMC8434514 | biostudies-literature
| S-EPMC8768027 | biostudies-literature
| S-EPMC9112524 | biostudies-literature
| S-EPMC4184266 | biostudies-literature