Dataset Information

Predicting phenotype from genotype: Improving accuracy through more robust experimental and computational modeling.

ABSTRACT: Computational prediction yields efficient and scalable initial assessments of how variants of unknown significance may affect human health. However, when discrepancies between these predictions and direct experimental measurements of functional impact arise, inaccurate computational predictions are frequently assumed as the source. Here, we present a methodological analysis indicating that shortcomings in both computational and biological data can contribute to these disagreements. We demonstrate that incomplete assaying of multifunctional proteins can affect the strength of correlations between prediction and experiments; a variant's full impact on function is better quantified by considering multiple assays that probe an ensemble of protein functions. Additionally, many variants predictions are sensitive to protein alignment construction and can be customized to maximize relevance of predictions to a specific experimental question. We conclude that inconsistencies between computation and experiment can often be attributed to the fact that they do not test identical hypotheses. Aligning the design of the computational input with the design of the experimental output will require cooperation between computational and biological scientists, but will also lead to improved estimations of computational prediction accuracy and a better understanding of the genotype-phenotype relationship.

SUBMITTER: Gallion J

PROVIDER: S-EPMC5516182 | biostudies-literature | 2017 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Predicting phenotype from genotype: Improving accuracy through more robust experimental and computational modeling.

Gallion Jonathan J Koire Amanda A Katsonis Panagiotis P Schoenegge Anne-Marie AM Bouvier Michel M Lichtarge Olivier O

Human mutation 20170228 5

Computational prediction yields efficient and scalable initial assessments of how variants of unknown significance may affect human health. However, when discrepancies between these predictions and direct experimental measurements of functional impact arise, inaccurate computational predictions are frequently assumed as the source. Here, we present a methodological analysis indicating that shortcomings in both computational and biological data can contribute to these disagreements. We demonstrat ...[more]

PMID: 28230923

Similar Datasets

Project description:PurposeThe accuracy of predicting conversion from early-stage age-related macular degeneration (AMD) to the advanced stages of choroidal neovascularization (CNV) or geographic atrophy (GA) was evaluated to determine whether inclusion of clinically relevant genetic markers improved accuracy beyond prediction using phenotypic risk factors alone.DesignCohort study.ParticipantsWhite, non-Hispanic subjects participating in the Age-Related Eye Disease Study (AREDS) sponsored by the National Eye Institute consented to provide a genetic specimen. Of 2415 DNA specimens available, 940 were from disease-free subjects and 1475 were from subjects with early or intermediate AMD.MethodsDNA specimens from study subjects were genotyped for 14 single nucleotide polymorphisms (SNPs) in genes shown previously to associate with CNV: ARMS2, CFH, C3, C2, FB, CFHR4, CFHR5, and F13B. Clinical demographics and established disease associations, including age, sex, smoking status, body mass index (BMI), AREDS treatment category, and educational level, were evaluated. Four multivariate logistic models (phenotype; genotype; phenotype + genotype; and phenotype + genotype + demographic + environmental factors) were tested using 2 end points (CNV, GA). Models were fitted using Cox proportional hazards regression to use time-to-disease onset data.Main outcome measuresBrier score (measure of accuracy) was used to identify the model with the lowest prediction error in the training set. The most accurate model was subjected to independent statistical validation, and final model performance was described using area under the receiver operator curve (AUC) or C-statistic.ResultsThe CNV prediction models that combined genotype with phenotype with or without age and smoking revealed superior performance (C-statistic = 0.96) compared with the phenotype model based on the simplified severity scale and the presence of CNV in the nonstudy eye (C-statistic = 0.89; P<0.01). For GA, the model that combined genotype with phenotype demonstrated the highest performance (AUC = 0.94). Smoking status and ARMS2 genotype had less of an impact on the prediction of GA compared with CNV.ConclusionsInclusion of genotype assessment improves CNV prediction beyond that achievable with phenotype alone and may improve patient management. Separate assessments should be used to predict progression to CNV and GA because genetic markers and smoking status do not equally predict both end points.Financial disclosure(s)Proprietary or commercial disclosure may be found after the references.

Project description:Most human birth defects are phenotypically variable even when they share a common genetic basis. Our understanding of the mechanisms of this variation is limited, but they are thought to be due to complex gene-environment interactions. Loss of the transcription factor Gata3 associates with the highly variable human birth defects HDR syndrome and microsomia, and can lead to disruption of the neural crest-derived facial skeleton. We have demonstrated that zebrafish gata3 mutants model the variability seen in humans, with genetic background and candidate pathways modifying the resulting phenotype. In this study, we sought to use an unbiased bioinformatic approach to identify environmental modifiers of gata3 mutant craniofacial phenotypes. The LINCs L1000 dataset identifies chemicals that generate differential gene expression that either positively or negatively correlates with an input gene list. These chemicals are predicted to worsen or lessen the mutant phenotype, respectively. We performed RNA-seq on neural crest cells isolated from zebrafish across control, Gata3 loss-of-function, and Gata3 rescue groups. Differential expression analyses revealed 551 potential targets of gata3. We queried the LINCs database with the 100 most upregulated and 100 most downregulated genes. We tested the top eight available chemicals predicted to worsen the mutant phenotype and the top eight predicted to lessen the phenotype. Of these, we found that vinblastine, a microtubule inhibitor, and clofibric acid, a PPAR-alpha agonist, did indeed worsen the gata3 phenotype. The Topoisomerase II and RNA-pol II inhibitors daunorubicin and triptolide, respectively, lessened the phenotype. GO analysis identified Wnt signaling and RNA polymerase function as being enriched in our RNA-seq data, consistent with the mechanism of action of some of the chemicals. Our study illustrates multiple potential pathways for Gata3 function, and demonstrates a systematic, unbiased process to identify modifiers of genotype-phenotype correlations.

Dataset Information

Predicting phenotype from genotype: Improving accuracy through more robust experimental and computational modeling.

Publications

Predicting phenotype from genotype: Improving accuracy through more robust experimental and computational modeling.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets