Dataset Information

In silico prediction of novel therapeutic targets using gene-disease association data.

ABSTRACT:

Background

Target identification and validation is a pressing challenge in the pharmaceutical industry, with many of the programmes that fail for efficacy reasons showing poor association between the drug target and the disease. Computational prediction of successful targets could have a considerable impact on attrition rates in the drug discovery pipeline by significantly reducing the initial search space. Here, we explore whether gene-disease association data from the Open Targets platform is sufficient to predict therapeutic targets that are actively being pursued by pharmaceutical companies or are already on the market.

Methods

To test our hypothesis, we train four different classifiers (a random forest, a support vector machine, a neural network and a gradient boosting machine) on partially labelled data and evaluate their performance using nested cross-validation and testing on an independent set. We then select the best performing model and use it to make predictions on more than 15,000 genes. Finally, we validate our predictions by mining the scientific literature for proposed therapeutic targets.

Results

We observe that the data types with the best predictive power are animal models showing a disease-relevant phenotype, differential expression in diseased tissue and genetic association with the disease under investigation. On a test set, the neural network classifier achieves over 71% accuracy with an AUC of 0.76 when predicting therapeutic targets in a semi-supervised learning setting. We use this model to gain insights into current and failed programmes and to predict 1431 novel targets, of which a highly significant proportion has been independently proposed in the literature.

Conclusions

Our in silico approach shows that data linking genes and diseases is sufficient to predict novel therapeutic targets effectively and confirms that this type of evidence is essential for formulating or strengthening hypotheses in the target discovery process. Ultimately, more rapid and automated target prioritisation holds the potential to reduce both the costs and the development times associated with bringing new medicines to patients.

SUBMITTER: Ferrero E

PROVIDER: S-EPMC5576250 | biostudies-literature | 2017 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

In silico prediction of novel therapeutic targets using gene-disease association data.

Ferrero Enrico E Dunham Ian I Sanseau Philippe P

Journal of translational medicine 20170829 1

<h4>Background</h4>Target identification and validation is a pressing challenge in the pharmaceutical industry, with many of the programmes that fail for efficacy reasons showing poor association between the drug target and the disease. Computational prediction of successful targets could have a considerable impact on attrition rates in the drug discovery pipeline by significantly reducing the initial search space. Here, we explore whether gene-disease association data from the Open Targets plat ...[more]

PMID: 28851378

Similar Datasets

Project description:Approximately half of known human miRNAs are located in the introns of protein coding genes. Some of these intronic miRNAs are only expressed when their host gene is and, as such, their steady state expression levels are highly correlated with those of the host gene's mRNA. Recently host gene expression levels have been used to predict the targets of intronic miRNAs by identifying other mRNAs that they have consistent negative correlation with. This is a potentially powerful approach because it allows a large number of expression profiling studies to be used but needs refinement because mRNAs can be targeted by multiple miRNAs and not all intronic miRNAs are co-expressed with their host genes.Here we introduce InMiR, a new computational method that uses a linear-Gaussian model to predict the targets of intronic miRNAs based on the expression profiles of their host genes across a large number of datasets. Our method recovers nearly twice as many true positives at the same fixed false positive rate as a comparable method that only considers correlations. Through an analysis of 140 Affymetrix datasets from Gene Expression Omnibus, we build a network of 19,926 interactions among 57 intronic miRNAs and 3,864 targets. InMiR can also predict which host genes have expression profiles that are good surrogates for those of their intronic miRNAs. Host genes that InMiR predicts are bad surrogates contain significantly more miRNA target sites in their 3' UTRs and are significantly more likely to have predicted Pol II and Pol III promoters in their introns.We provide a dataset of 1,935 predicted mRNA targets for 22 intronic miRNAs. These prediction are supported both by sequence features and expression. By combining our results with previous reports, we distinguish three classes of intronic miRNAs: Those that are tightly regulated with their host gene; those that are likely to be expressed from the same promoter but whose host gene is highly regulated by miRNAs; and those likely to have independent promoters.

Dataset Information

In silico prediction of novel therapeutic targets using gene-disease association data.

Background

Methods

Results

Conclusions

Publications

In silico prediction of novel therapeutic targets using gene-disease association data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets