Unknown

Dataset Information

0

Multi-Label Random Forest Model for Tuberculosis Drug Resistance Classification and Mutation Ranking.


ABSTRACT: Resistance prediction and mutation ranking are important tasks in the analysis of Tuberculosis sequence data. Due to standard regimens for the use of first-line antibiotics, resistance co-occurrence, in which samples are resistant to multiple drugs, is common. Analysing all drugs simultaneously should therefore enable patterns reflecting resistance co-occurrence to be exploited for resistance prediction. Here, multi-label random forest (MLRF) models are compared with single-label random forest (SLRF) for both predicting phenotypic resistance from whole genome sequences and identifying important mutations for better prediction of four first-line drugs in a dataset of 13402 Mycobacterium tuberculosis isolates. Results confirmed that MLRFs can improve performance compared to conventional clinical methods (by 18.10%) and SLRFs (by 0.91%). In addition, we identified a list of candidate mutations that are important for resistance prediction or that are related to resistance co-occurrence. Moreover, we found that retraining our analysis to a subset of top-ranked mutations was sufficient to achieve satisfactory performance. The source code can be found at http://www.robots.ox.ac.uk/~davidc/code.php.

SUBMITTER: Kouchaki S 

PROVIDER: S-EPMC7188832 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

altmetric image

Publications

Multi-Label Random Forest Model for Tuberculosis Drug Resistance Classification and Mutation Ranking.

Kouchaki Samaneh S   Yang Yang Y   Lachapelle Alexander A   Walker Timothy M TM   Walker A Sarah AS   Peto Timothy E A TEA   Crook Derrick W DW   Clifton David A DA  

Frontiers in microbiology 20200422


Resistance prediction and mutation ranking are important tasks in the analysis of Tuberculosis sequence data. Due to standard regimens for the use of first-line antibiotics, resistance co-occurrence, in which samples are resistant to multiple drugs, is common. Analysing all drugs simultaneously should therefore enable patterns reflecting resistance co-occurrence to be exploited for resistance prediction. Here, multi-label random forest (MLRF) models are compared with single-label random forest (  ...[more]

Similar Datasets

| S-EPMC6929337 | biostudies-literature
| S-EPMC5423585 | biostudies-literature
| S-EPMC3516432 | biostudies-literature
| S-EPMC2637921 | biostudies-literature
| S-EPMC9356072 | biostudies-literature
| S-EPMC8154215 | biostudies-literature
| S-EPMC2916923 | biostudies-literature
| S-EPMC3218317 | biostudies-literature
| S-EPMC8257600 | biostudies-literature
| S-EPMC5595802 | biostudies-other