Unknown

Dataset Information

0

Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection.


ABSTRACT:

Background

Several studies demonstrated the feasibility of predicting bacterial antibiotic resistance phenotypes from whole-genome sequences, the prediction process usually amounting to detecting the presence of genes involved in antibiotic resistance mechanisms, or of specific mutations, previously identified from a training panel of strains, within these genes. We address the problem from the supervised statistical learning perspective, not relying on prior information about such resistance factors. We rely on a k-mer based genotyping scheme and a logistic regression model, thereby combining several k-mers into a probabilistic model. To identify a small yet predictive set of k-mers, we rely on the stability selection approach (Meinshausen et al., J R Stat Soc Ser B 72:417-73, 2010), that consists in penalizing logistic regression models with a Lasso penalty, coupled with extensive resampling procedures.

Results

Using public datasets, we applied the resulting classifiers to two bacterial species and achieved predictive performance equivalent to state of the art. The models are extremely sparse, involving 1 to 8 k-mers per antibiotic, hence are remarkably easy and fast to evaluate on new genomes (from raw reads to assemblies).

Conclusion

Our proof of concept therefore demonstrates that stability selection is a powerful approach to investigate bacterial genotype-phenotype relationships.

SUBMITTER: Mahe P 

PROVIDER: S-EPMC6192184 | biostudies-literature | 2018 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection.

Mahé Pierre P   Tournoud Maud M  

BMC bioinformatics 20181017 1


<h4>Background</h4>Several studies demonstrated the feasibility of predicting bacterial antibiotic resistance phenotypes from whole-genome sequences, the prediction process usually amounting to detecting the presence of genes involved in antibiotic resistance mechanisms, or of specific mutations, previously identified from a training panel of strains, within these genes. We address the problem from the supervised statistical learning perspective, not relying on prior information about such resis  ...[more]

Similar Datasets

| S-EPMC5038937 | biostudies-literature
| S-EPMC4330336 | biostudies-literature
| S-EPMC7526191 | biostudies-literature
| S-EPMC4538840 | biostudies-literature
| S-EPMC6325232 | biostudies-literature
| S-EPMC8269213 | biostudies-literature