Unknown

Dataset Information

0

Naive Bayes Classifiers and accompanying dataset for Pseudomonas syringae isolate characterization.


ABSTRACT: The Pseudomonas syringae species complex (PSSC) is a diverse group of plant pathogens with a collective host range encompassing almost every food crop grown today. As a threat to global food security, rapid detection and characterization of epidemic and emerging pathogenic lineages is essential. However, phylogenetic identification is often complicated by an unclarified and ever-changing taxonomy, making practical use of available databases and the proper training of classifiers difficult. As such, while amplicon sequencing is a common method for routine identification of PSSC isolates, there is no efficient method for accurate classification based on this data. Here we present a suite of five Naïve bayes classifiers for PCR primer sets widely used for PSSC identification, trained on in-silico amplicon data from 2,161 published PSSC genomes using the life identification number (LIN) hierarchical clustering algorithm in place of traditional Linnaean taxonomy. Additionally, we include a dataset for translating classification results back into traditional taxonomic nomenclature (i.e. species, phylogroup, pathovar), and for predicting virulence factor repertoires.

SUBMITTER: Fautt C 

PROVIDER: S-EPMC10850129 | biostudies-literature | 2024 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Naïve Bayes Classifiers and accompanying dataset for Pseudomonas syringae isolate characterization.

Fautt Chad C   Couradeau Estelle E   Hockett Kevin L KL  

Scientific data 20240207 1


The Pseudomonas syringae species complex (PSSC) is a diverse group of plant pathogens with a collective host range encompassing almost every food crop grown today. As a threat to global food security, rapid detection and characterization of epidemic and emerging pathogenic lineages is essential. However, phylogenetic identification is often complicated by an unclarified and ever-changing taxonomy, making practical use of available databases and the proper training of classifiers difficult. As su  ...[more]

Similar Datasets

| S-EPMC4219333 | biostudies-literature
| S-EPMC5860114 | biostudies-literature
| S-EPMC94769 | biostudies-literature
| S-EPMC3597554 | biostudies-literature
| S-EPMC6559019 | biostudies-literature
| S-EPMC9929173 | biostudies-literature
| S-EPMC10919322 | biostudies-literature
| S-EPMC309951 | biostudies-literature
| S-EPMC3165696 | biostudies-literature
| S-EPMC3207962 | biostudies-literature