Dataset Information

Genome-wide investigation of gene-cancer associations for the prediction of novel therapeutic targets in oncology.

ABSTRACT: A major cause of failed drug discovery programs is suboptimal target selection, resulting in the development of drug candidates that are potent inhibitors, but ineffective at treating the disease. In the genomics era, the availability of large biomedical datasets with genome-wide readouts has the potential to transform target selection and validation. In this study we investigate how computational intelligence methods can be applied to predict novel therapeutic targets in oncology. We compared different machine learning classifiers applied to the task of drug target classification for nine different human cancer types. For each cancer type, a set of "known" target genes was obtained and equally-sized sets of "non-targets" were sampled multiple times from the human protein-coding genes. Models were trained on mutation, gene expression (TCGA), and gene essentiality (DepMap) data. In addition, we generated a numerical embedding of the interaction network of protein-coding genes using deep network representation learning and included the results in the modeling. We assessed feature importance using a random forests classifier and performed feature selection based on measuring permutation importance against a null distribution. Our best models achieved good generalization performance based on the AUROC metric. With the best model for each cancer type, we ran predictions on more than 15,000 protein-coding genes to identify potential novel targets. Our results indicate that this approach may be useful to inform early stages of the drug discovery pipeline.

SUBMITTER: Bazaga A

PROVIDER: S-EPMC7330039 | biostudies-literature | 2020 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Genome-wide investigation of gene-cancer associations for the prediction of novel therapeutic targets in oncology.

Bazaga Adrián A Leggate Dan D Weisser Hendrik H

Scientific reports 20200701 1

A major cause of failed drug discovery programs is suboptimal target selection, resulting in the development of drug candidates that are potent inhibitors, but ineffective at treating the disease. In the genomics era, the availability of large biomedical datasets with genome-wide readouts has the potential to transform target selection and validation. In this study we investigate how computational intelligence methods can be applied to predict novel therapeutic targets in oncology. We compared d ...[more]

PMID: 32612205

Similar Datasets

Project description:Shigella stands as a major contributor to bacterial dysentery worldwide scale, particularly in developing countries with inadequate sanitation and hygiene. The emergence of multidrug-resistant strains exacerbates the challenge of treating Shigella infections, particularly in regions where access to healthcare and alternative antibiotics is limited. Therefore, investigations on how bacteria evade antibiotics and eventually develop resistance could open new avenues for research to develop novel therapeutics. The aim of this study was to analyze whole genome sequence (WGS) of human pathogenic Shigella spp. to elucidate the antibiotic resistance genes (ARGs) and their mechanism of resistance, gene-drug interactions, protein-protein interactions, and functional pathways to screen potential therapeutic candidate(s). We comprehensively analyzed 45 WGS of Shigella, including S. flexneri (n = 17), S. dysenteriae (n = 14), S. boydii (n = 11), and S. sonnei (n = 13), through different bioinformatics tools. Evolutionary phylogenetic analysis showed three distinct clades among the circulating strains of Shigella worldwide, with less genomic diversity. In this study, 2,146 ARGs were predicted in 45 genomes (average 47.69 ARGs/genome), of which only 91 ARGs were found to be shared across the genomes. Majority of these ARGs conferred their resistance through antibiotic efflux pump (51.0%) followed by antibiotic target alteration (23%) and antibiotic target replacement (18%). We identified 13 hub proteins, of which four proteins (e.g., tolC, acrR, mdtA, and gyrA) were detected as potential hub proteins to be associated with antibiotic efflux pump and target alteration mechanisms. These hub proteins were significantly (p < 0.05) enriched in biological process, molecular function, and cellular components. Therefore, the finding of this study suggests that human pathogenic Shigella strains harbored a wide range of ARGs that confer resistance through antibiotic efflux pumps and antibiotic target modification mechanisms, which must be taken into account to devise and formulate treatment strategy against this pathogen. Moreover, the identified hub proteins could be exploited to design and develop novel therapeutics against MDR pathogens like Shigella.

Project description:BackgroundTarget identification and validation is a pressing challenge in the pharmaceutical industry, with many of the programmes that fail for efficacy reasons showing poor association between the drug target and the disease. Computational prediction of successful targets could have a considerable impact on attrition rates in the drug discovery pipeline by significantly reducing the initial search space. Here, we explore whether gene-disease association data from the Open Targets platform is sufficient to predict therapeutic targets that are actively being pursued by pharmaceutical companies or are already on the market.MethodsTo test our hypothesis, we train four different classifiers (a random forest, a support vector machine, a neural network and a gradient boosting machine) on partially labelled data and evaluate their performance using nested cross-validation and testing on an independent set. We then select the best performing model and use it to make predictions on more than 15,000 genes. Finally, we validate our predictions by mining the scientific literature for proposed therapeutic targets.ResultsWe observe that the data types with the best predictive power are animal models showing a disease-relevant phenotype, differential expression in diseased tissue and genetic association with the disease under investigation. On a test set, the neural network classifier achieves over 71% accuracy with an AUC of 0.76 when predicting therapeutic targets in a semi-supervised learning setting. We use this model to gain insights into current and failed programmes and to predict 1431 novel targets, of which a highly significant proportion has been independently proposed in the literature.ConclusionsOur in silico approach shows that data linking genes and diseases is sufficient to predict novel therapeutic targets effectively and confirms that this type of evidence is essential for formulating or strengthening hypotheses in the target discovery process. Ultimately, more rapid and automated target prioritisation holds the potential to reduce both the costs and the development times associated with bringing new medicines to patients.

Project description:BackgroundThere are no effective pharmacological treatments for sarcopenia. We aim to identify potential therapeutic targets for sarcopenia by integrating various publicly available datasets.MethodsWe integrated druggable genome data, cis-eQTL/cis-pQTL from human blood and skeletal muscle tissue, and GWAS summary data of sarcopenia-related traits to analyse the potential causal relationships between drug target genes and sarcopenia using the Mendelian Randomization (MR) method. Sensitivity analyses and Bayesian colocalization were employed to validate the causal relationships. We also assessed the side effects or additional indications of the identified drug targets using a phenome-wide MR (Phe-MR) approach and investigated actionable drugs for target genes using available databases.ResultsMR analysis identified 17 druggable genes with potential causation to sarcopenia in human blood or skeletal muscle tissue. Six of them (HP, HLA-DRA, MAP 3K3, MFGE8, COL15A1, and AURKA) were further confirmed by Bayesian colocalization (PPH4 > 90%). The up-regulation of HP [higher ALM (beta: 0.012, 95% CI: 0.007-0.018, P = 1.2*10-5) and higher grip strength (OR: 0.96, 95% CI: 0.94-0.98, P = 4.2*10-5)], MAP 3K3 [higher ALM (beta: 0.24, 95% CI: 0.21-0.26, P = 1.8*10-94), higher grip strength (OR: 0.82, 95% CI: 0.75-0.90, P = 2.1*10-5), and faster walking pace (beta: 0.03, 95% CI: 0.02-0.05, P = 8.5*10-6)], and MFGE8 [higher ALM (muscle eQTL, beta: 0.09, 95% CI: 0.06-0.11, P = 6.1*10-13; blood pQTL, beta: 0.05, 95% CI: 0.03-0.07, P = 3.8*10-09)], as well as the down-regulation of HLA-DRA [lower ALM (beta: -0.09, 95% CI: -0.11 to -0.08, P = 5.4*10-36) and lower grip strength (OR: 1.13, 95% CI: 1.07-1.20, P = 1.8*10-5)] and COL15A1 [higher ALM (muscle eQTL, beta: -0.07, 95% CI: -0.10 to -0.04, P = 3.4*10-07; blood pQTL, beta: -0.05, 95% CI: -0.06 to -0.03, P = 1.6*10-07)], decreased the risk of sarcopenia. AURKA in blood (beta: -0.16, 95% CI: -0.22 to -0.09, P = 2.1*10-06) and skeletal muscle (beta: 0.03, 95% CI: 0.02 to 0.05, P = 5.3*10-05) tissues showed an inverse relationship with sarcopenia risk. The Phe-MR indicated that the six potential therapeutic targets for sarcopenia had no significant adverse effects. Drug repurposing analysis supported zinc supplementation and collagenase clostridium histolyticum might be potential therapeutics for sarcopenia by activating HP and inhibiting COL15A1, respectively.ConclusionsOur research indicated MAP 3K3, MFGE8, COL15A1, HP, and HLA-DRA may serve as promising targets for sarcopenia, while the effectiveness of zinc supplementation and collagenase clostridium histolyticum for sarcopenia requires further validation.

Project description:Helicobacter pylori is a gram-negative bacterium that colonizes the human gastric mucosa and can lead to gastric inflammation, ulcers, and stomach cancer. Due to the increase in H. pylori antimicrobial resistance new methods to identify the molecular mechanisms of H. pylori-induced pathology are urgently needed. Here we utilized a computational biology approach, harnessing genome-wide association and gene expression studies to identify genes and pathways determining disease development. We mined gene expression data related to H. pylori-infection and its complications from publicly available databases to identify four human datasets as discovery datasets and used two different multi-cohort analysis pipelines to define a H. pylori-induced gene signature. An initial Helicobacter-signature was curated using the MetaIntegrator pipeline and validated in cell line model datasets. With this approach we identified cell line models that best match gene regulation in human pathology. A second analysis pipeline through NetworkAnalyst was used to refine our initial signature. This approach defined a 55-gene signature that is stably deregulated in disease conditions. The 55-gene signature was validated in datasets from human gastric adenocarcinomas and could separate tumor from normal tissue. As only a small number of H. pylori patients develop cancer, this gene-signature must interact with other host and environmental factors to initiate tumorigenesis. We tested for possible interactions between our curated gene signature and host genomic background mutations and polymorphisms by integrating genome-wide association studies (GWAS) and known oncogenes. We analyzed public databases to identify genes harboring single nucleotide polymorphisms (SNPs) associated with gastric pathologies and driver genes in gastric cancers. Using this approach, we identified 37 genes from GWA studies and 61 oncogenes, which were used with our 55-gene signature to map gene-gene interaction networks. In conclusion, our analysis defines a unique gene signature driven by H. pylori-infection at early phases and that remains relevant through different stages of pathology up to gastric cancer, a stage where H. pylori itself is rarely detectable. Furthermore, this signature elucidates many factors of host gene and pathway regulation in infection and can be used as a target for drug repurposing and testing of infection models suitability to investigate human infection.

Dataset Information

Genome-wide investigation of gene-cancer associations for the prediction of novel therapeutic targets in oncology.

Publications

Genome-wide investigation of gene-cancer associations for the prediction of novel therapeutic targets in oncology.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets