Dataset Information

Machine Learning Predicts Accurately Mycobacterium tuberculosis Drug Resistance From Whole Genome Sequencing Data.

ABSTRACT: Background: Tuberculosis disease, caused by Mycobacterium tuberculosis, is a major public health problem. The emergence of M. tuberculosis strains resistant to existing treatments threatens to derail control efforts. Resistance is mainly conferred by mutations in genes coding for drug targets or converting enzymes, but our knowledge of these mutations is incomplete. Whole genome sequencing (WGS) is an increasingly common approach to rapidly characterize isolates and identify mutations predicting antimicrobial resistance and thereby providing a diagnostic tool to assist clinical decision making. Methods: We applied machine learning approaches to 16,688 M. tuberculosis isolates that have undergone WGS and laboratory drug-susceptibility testing (DST) across 14 antituberculosis drugs, with 22.5% of samples being multidrug resistant and 2.1% being extensively drug resistant. We used non-parametric classification-tree and gradient-boosted-tree models to predict drug resistance and uncover any associated novel putative mutations. We fitted separate models for each drug, with and without "co-occurrent resistance" markers known to be causing resistance to drugs other than the one of interest. Predictive performance was measured using sensitivity, specificity, and the area under the receiver operating characteristic curve, assuming DST results as the gold standard. Results: The predictive performance was highest for resistance to first-line drugs, amikacin, kanamycin, ciprofloxacin, moxifloxacin, and multidrug-resistant tuberculosis (area under the receiver operating characteristic curve above 96%), and lowest for third-line drugs such as D-cycloserine and Para-aminosalisylic acid (area under the curve below 85%). The inclusion of co-occurrent resistance markers led to improved performance for some drugs and superior results when compared to similar models in other large-scale studies, which had smaller sample sizes. Overall, the gradient-boosted-tree models performed better than the classification-tree models. The mutation-rank analysis detected no new single nucleotide polymorphisms linked to drug resistance. Discordance between DST and genotypically inferred resistance may be explained by DST errors, novel rare mutations, hetero-resistance, and nongenomic drivers such as efflux-pump upregulation. Conclusion: Our work demonstrates the utility of machine learning as a flexible approach to drug resistance prediction that is able to accommodate a much larger number of predictors and to summarize their predictive ability, thus assisting clinical decision making and single nucleotide polymorphism detection in an era of increasing WGS data generation.

SUBMITTER: Deelder W

PROVIDER: S-EPMC6775242 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Machine Learning Predicts Accurately Mycobacterium tuberculosis Drug Resistance From Whole Genome Sequencing Data.

Deelder Wouter W Christakoudi Sofia S Phelan Jody J Benavente Ernest Diez ED Campino Susana S McNerney Ruth R Palla Luigi L Clark Taane G TG

Frontiers in genetics 20190926

Background: Tuberculosis disease, caused by Mycobacterium tuberculosis, is a major public health problem. The emergence of M. tuberculosis strains resistant to existing treatments threatens to derail control efforts. Resistance is mainly conferred by mutations in genes coding for drug targets or converting enzymes, but our knowledge of these mutations is incomplete. Whole genome sequencing (WGS) is an increasingly common approach to rapidly characterize isolates and identify ...[more]

PMID: 31616478

Similar Datasets

Project description:Combating the spread of drug resistant tuberculosis is a global health priority. Whole genome association studies are being applied to identify genetic determinants of resistance to anti-tuberculosis drugs. Protein structure and interaction modelling are used to understand the functional effects of putative mutations and provide insight into the molecular mechanisms leading to resistance.To investigate the potential utility of these approaches, we analysed the genomes of 144 Mycobacterium tuberculosis clinical isolates from The Special Programme for Research and Training in Tropical Diseases (TDR) collection sourced from 20 countries in four continents. A genome-wide approach was applied to 127 isolates to identify polymorphisms associated with minimum inhibitory concentrations for first-line anti-tuberculosis drugs. In addition, the effect of identified candidate mutations on protein stability and interactions was assessed quantitatively with well-established computational methods.The analysis revealed that mutations in the genes rpoB (rifampicin), katG (isoniazid), inhA-promoter (isoniazid), rpsL (streptomycin) and embB (ethambutol) were responsible for the majority of resistance observed. A subset of the mutations identified in rpoB and katG were predicted to affect protein stability. Further, a strong direct correlation was observed between the minimum inhibitory concentration values and the distance of the mutated residues in the three-dimensional structures of rpoB and katG to their respective drugs binding sites.Using the TDR resource, we demonstrate the usefulness of whole genome association and convergent evolution approaches to detect known and potentially novel mutations associated with drug resistance. Further, protein structural modelling could provide a means of predicting the impact of polymorphisms on drug efficacy in the absence of phenotypic data. These approaches could ultimately lead to novel resistance mutations to improve the design of tuberculosis control measures, such as diagnostics, and inform patient management.

Project description:BackgroundTuberculosis is one of the deadliest disease caused by Mycobacterium tuberculosis. Its treatment still becomes a burden for many countries including Indonesia. Drug resistance is one of the problems in TB treatment. However, a development in the molecular field through Whole-genome sequencing (WGS) can be used as a solution in detecting mutations associated with TB- drugs. This investigation intended to implement this data for supporting the scientific community in deeply understanding any TB epidemiology and evolution in Papua along with detecting any mutations in genes associated with TB-Drugs.ResultA whole-genome sequencing was performed on the random samples from TB Referral Laboratory in Papua utilizing MiSeq 600 cycle Reagent Kit (V3). Furthermore, TBProfiler was used for genome analysis, RAST Server was employed for annotation, while Gview server was applied for BLAST genome mapping and a Microscope server was implemented for Regions of Genomic Plasticity (RGP). The largest genome of M. tuberculosis obtained was at the size of 4,396,040 bp with subsystems number at 309 and the number of coding sequences at 4326. One sample (TB751) contained one RGP. The drug resistance analysis revealed that several mutations associated with TB-drug resistance existed. In details, mutations of rpoB gene which were identified as S450L, D435Y, H445Y, L430P, and Q432K had caused the reduced effectiveness of rifampicin; while the mutases in katG (S315T), kasA (312S), inhA (I21V), and Rv1482c-fabG1 (C-15 T) genes had contributed to the resistance in isoniazid. In streptomycin, the resistance was triggered by the mutations in rpsL (K43R) and rrs (A514C, A514T) genes, and, in Amikacin, its resistance was led by mutations in rrs (A514C) gene. Additionally, in Ethambutol and Pyrazinamide, their reduced effectiveness was provoked by embB gene mutases (M306L, M306V, D1024N) and pncA (W119R).ConclusionsThe results from whole-genome sequencing of TB clinical sample in Papua, Indonesia could contribute to the surveillance of TB-drug resistance. In the drug resistance profile, there were 15 Multi Drugs Resistance (MDR) samples. However, Extensively Drug-resistant (XDR) samples have not been found, but samples were resistant to only Amikacin, a second-line drug.

Dataset Information

Machine Learning Predicts Accurately Mycobacterium tuberculosis Drug Resistance From Whole Genome Sequencing Data.

Publications

Machine Learning Predicts Accurately <i>Mycobacterium tuberculosis</i> Drug Resistance From Whole Genome Sequencing Data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Dataset Information

Machine Learning Predicts Accurately Mycobacterium tuberculosis Drug Resistance From Whole Genome Sequencing Data.

Publications

Machine Learning Predicts Accurately &lt;i&gt;Mycobacterium tuberculosis&lt;/i&gt; Drug Resistance From Whole Genome Sequencing Data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Machine Learning Predicts Accurately <i>Mycobacterium tuberculosis</i> Drug Resistance From Whole Genome Sequencing Data.