Unknown

Dataset Information

0

Application of the random forest algorithm to Streptococcus pyogenes response regulator allele variation: from machine learning to evolutionary models.


ABSTRACT: Group A Streptococcus (GAS) is a globally significant bacterial pathogen. The GAS genotyping gold standard characterises the nucleotide variation of emm, which encodes a surface-exposed protein that is recombinogenic and under immune-based selection pressure. Within a supervised learning methodology, we tested three random forest (RF) algorithms (Guided, Ordinary, and Regularized) and 53 GAS response regulator (RR) allele types to infer six genomic traits (emm-type, emm-subtype, tissue and country of sample, clinical outcomes, and isolate invasiveness). The Guided, Ordinary, and Regularized RF classifiers inferred the emm-type with accuracies of 96.7%, 95.7%, and 95.2%, using ten, three, and four RR alleles in the feature set, respectively. Notably, we inferred the emm-type with 93.7% accuracy using only mga2 and lrp. We demonstrated a utility for inferring emm-subtype (89.9%), country (88.6%), invasiveness (84.7%), but not clinical (56.9%), or tissue (56.4%), which is consistent with the complexity of GAS pathophysiology. We identified a novel cell wall-spanning domain (SF5), and proposed evolutionary pathways depicting the 'contrariwise' and 'likewise' chimeric deletion-fusion of emm and enn. We identified an intermediate strain, which provides evidence of the time-dependent excision of mga regulon genes. Overall, our workflow advances the understanding of the GAS mga regulon and its plasticity.

SUBMITTER: Buckley SJ 

PROVIDER: S-EPMC8209152 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

2012-05-10 | GSE37858 | GEO
2012-05-09 | E-GEOD-37858 | biostudies-arrayexpress
2022-05-16 | GSE189510 | GEO
| S-EPMC8719667 | biostudies-literature
| S-EPMC7439995 | biostudies-literature
| S-EPMC8575902 | biostudies-literature
| S-EPMC6660107 | biostudies-literature
| S-EPMC3597546 | biostudies-literature
| S-EPMC4098612 | biostudies-other
| S-EPMC4794775 | biostudies-literature