Unknown

Dataset Information

0

ENTPRISE-X: Predicting disease-associated frameshift and nonsense mutations.


ABSTRACT: To exploit the plethora of information provided by Next Generation Sequencing, the identification of the genetic mutations responsible for disease in general or cancer in particular, among the thousands of neutral germline or somatic variations is a crucial task. Genome-wide association studies for the detection of disease-associated genes or cancer drivers can only identify common variations or driver genes in a cohort of patients. Thus, they cannot discover unique disease-associated mutations or cancer driver genes on a personal basis. Moreover, even when there are such common variations, their significance is unknown. Here, we extend the machine learning based approach ENTPRISE developed for predicting the disease association of missense mutations to frameshift and nonsense mutations. The new approach, ENTPRISE-X, is shown to outperform the state-of-the-art methods VEST-indel and DDIG-in for predicting the disease association of germline frameshift mutations in terms of balanced measure Matthew's correlation coefficient, MCC, with a MCC of 0.586 for ENTPRISE-X, versus 0.412 by VEST-indel and 0.321 by DDIG-in, respectively. Large scale testing on the ExAC dataset shows ENTPRISE-X has a much lower fraction of 16% of variations classified as disease causing, as compared to VEST-indel's 26% and DDIG-in's 65% of predictions as being disease-associated. A web server for ENTPRISE-X is freely available for academic users at http://cssb2.biology.gatech.edu/entprise-x.

SUBMITTER: Zhou H 

PROVIDER: S-EPMC5933770 | biostudies-literature | 2018

REPOSITORIES: biostudies-literature

altmetric image

Publications

ENTPRISE-X: Predicting disease-associated frameshift and nonsense mutations.

Zhou Hongyi H   Gao Mu M   Skolnick Jeffrey J  

PloS one 20180503 5


To exploit the plethora of information provided by Next Generation Sequencing, the identification of the genetic mutations responsible for disease in general or cancer in particular, among the thousands of neutral germline or somatic variations is a crucial task. Genome-wide association studies for the detection of disease-associated genes or cancer drivers can only identify common variations or driver genes in a cohort of patients. Thus, they cannot discover unique disease-associated mutations  ...[more]

Similar Datasets

| S-EPMC6245819 | biostudies-other
| S-EPMC10682384 | biostudies-literature
| S-EPMC6978253 | biostudies-literature
| S-EPMC3049309 | biostudies-literature
| S-EPMC2728896 | biostudies-literature
| S-EPMC10314070 | biostudies-literature
| S-EPMC3674965 | biostudies-literature
| S-EPMC2933350 | biostudies-literature
| S-EPMC10959953 | biostudies-literature
| S-EPMC1235530 | biostudies-literature