Dataset Information

Prediction of drug metabolites using neural machine translation.

ABSTRACT: Metabolic processes in the human body can alter the structure of a drug affecting its efficacy and safety. As a result, the investigation of the metabolic fate of a candidate drug is an essential part of drug design studies. Computational approaches have been developed for the prediction of possible drug metabolites in an effort to assist the traditional and resource-demanding experimental route. Current methodologies are based upon metabolic transformation rules, which are tied to specific enzyme families and therefore lack generalization, and additionally may involve manual work from experts limiting scalability. We present a rule-free, end-to-end learning-based method for predicting possible human metabolites of small molecules including drugs. The metabolite prediction task is approached as a sequence translation problem with chemical compounds represented using the SMILES notation. We perform transfer learning on a deep learning transformer model for sequence translation, originally trained on chemical reaction data, to predict the outcome of human metabolic reactions. We further build an ensemble model to account for multiple and diverse metabolites. Extensive evaluation reveals that the proposed method generalizes well to different enzyme families, as it can correctly predict metabolites through phase I and phase II drug metabolism as well as other enzymes. Compared to existing rule-based approaches, our method has equivalent performance on the major enzyme families while it additionally finds metabolites through less common enzymes. Our results indicate that the proposed approach can provide a comprehensive study of drug metabolism that does not restrict to the major enzyme families and does not require the extraction of transformation rules.

SUBMITTER: Litsa EE

PROVIDER: S-EPMC8162519 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:The potential of inhibitory metabolites of perpetrator drugs to contribute to drug-drug interactions (DDIs) is uncommon and underestimated. However, the occurrence of unexpected DDI suggests the potential contribution of metabolites to the observed DDI. The aim of this study was to develop a physiologically-based pharmacokinetic (PBPK) model for bupropion and its three primary metabolites-hydroxybupropion, threohydrobupropion and erythrohydrobupropion-based on a mixed "bottom-up" and "top-down" approach and to contribute to the understanding of the involvement and impact of inhibitory metabolites for DDIs observed in the clinic. PK profiles from clinical researches of different dosages were used to verify the bupropion model. Reasonable PK profiles of bupropion and its metabolites were captured in the PBPK model. Confidence in the DDI prediction involving bupropion and co-administered CYP2D6 substrates could be maximized. The predicted maximum concentration (Cmax) area under the concentration-time curve (AUC) values and Cmax and AUC ratios were consistent with clinically observed data. The addition of the inhibitory metabolites into the PBPK model resulted in a more accurate prediction of DDIs (AUC and Cmax ratio) than that which only considered parent drug (bupropion) P450 inhibition. The simulation suggests that bupropion and its metabolites contribute to the DDI between bupropion and CYP2D6 substrates. The inhibitory potency from strong to weak is hydroxybupropion, threohydrobupropion, erythrohydrobupropion, and bupropion, respectively. The present bupropion PBPK model can be useful for predicting inhibition from bupropion in other clinical studies. This study highlights the need for caution and dosage adjustment when combining bupropion with medications metabolized by CYP2D6. It also demonstrates the feasibility of applying the PBPK approach to predict the DDI potential of drugs undergoing complex metabolism, especially in the DDI involving inhibitory metabolites.

Project description:ObjectiveTo analyze techniques for machine translation of electronic health records (EHRs) between long distance languages, using Basque and Spanish as a reference. We studied distinct configurations of neural machine translation systems and used different methods to overcome the lack of a bilingual corpus of clinical texts or health records in Basque and Spanish.Materials and methodsWe trained recurrent neural networks on an out-of-domain corpus with different hyperparameter values. Subsequently, we used the optimal configuration to evaluate machine translation of EHR templates between Basque and Spanish, using manual translations of the Basque templates into Spanish as a standard. We successively added to the training corpus clinical resources, including a Spanish-Basque dictionary derived from resources built for the machine translation of the Spanish edition of SNOMED CT into Basque, artificial sentences in Spanish and Basque derived from frequently occurring relationships in SNOMED CT, and Spanish monolingual EHRs. Apart from calculating bilingual evaluation understudy (BLEU) values, we tested the performance in the clinical domain by human evaluation.ResultsWe achieved slight improvements from our reference system by tuning some hyperparameters using an out-of-domain bilingual corpus, obtaining 10.67 BLEU points for Basque-to-Spanish clinical domain translation. The inclusion of clinical terminology in Spanish and Basque and the application of the back-translation technique on monolingual EHRs significantly improved the performance, obtaining 21.59 BLEU points. This was confirmed by the human evaluation performed by 2 clinicians, ranking our machine translations close to the human translations.DiscussionWe showed that, even after optimizing the hyperparameters out-of-domain, the inclusion of available resources from the clinical domain and applied methods were beneficial for the described objective, managing to obtain adequate translations of EHR templates.ConclusionWe have developed a system which is able to properly translate health record templates from Basque to Spanish without making use of any bilingual corpus of clinical texts or health records.

Dataset Information

Prediction of drug metabolites using neural machine translation.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets