Unknown

Dataset Information

0

Will Big Data Close the Missing Heritability Gap?


ABSTRACT: Despite the important discoveries reported by genome-wide association (GWA) studies, for most traits and diseases the prediction R-squared (R-sq.) achieved with genetic scores remains considerably lower than the trait heritability. Modern biobanks will soon deliver unprecedentedly large biomedical data sets: Will the advent of big data close the gap between the trait heritability and the proportion of variance that can be explained by a genomic predictor? We addressed this question using Bayesian methods and a data analysis approach that produces a surface response relating prediction R-sq. with sample size and model complexity (e.g., number of SNPs). We applied the methodology to data from the interim release of the UK Biobank. Focusing on human height as a model trait and using 80,000 records for model training, we achieved a prediction R-sq. in testing (n = 22,221) of 0.24 (95% C.I.: 0.23-0.25). Our estimates show that prediction R-sq. increases with sample size, reaching an estimated plateau at values that ranged from 0.1 to 0.37 for models using 500 and 50,000 (GWA-selected) SNPs, respectively. Soon much larger data sets will become available. Using the estimated surface response, we forecast that larger sample sizes will lead to further improvements in prediction R-sq. We conclude that big data will lead to a substantial reduction of the gap between trait heritability and the proportion of interindividual differences that can be explained with a genomic predictor. However, even with the power of big data, for complex traits we anticipate that the gap between prediction R-sq. and trait heritability will not be fully closed.

SUBMITTER: Kim H 

PROVIDER: S-EPMC5676235 | biostudies-literature | 2017 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Will Big Data Close the Missing Heritability Gap?

Kim Hwasoon H   Grueneberg Alexander A   Vazquez Ana I AI   Hsu Stephen S   de Los Campos Gustavo G  

Genetics 20170911 3


Despite the important discoveries reported by genome-wide association (GWA) studies, for most traits and diseases the prediction R-squared (R-sq.) achieved with genetic scores remains considerably lower than the trait heritability. Modern biobanks will soon deliver unprecedentedly large biomedical data sets: Will the advent of big data close the gap between the trait heritability and the proportion of variance that can be explained by a genomic predictor? We addressed this question using Bayesia  ...[more]

Similar Datasets

| S-EPMC5289922 | biostudies-literature
| PRJEB29771 | ENA
| S-EPMC8059005 | biostudies-literature
| S-EPMC3268279 | biostudies-other
| S-EPMC5520063 | biostudies-other
| S-EPMC5862984 | biostudies-literature
| S-EPMC3820606 | biostudies-literature
| S-EPMC4742325 | biostudies-literature
| S-EPMC7352099 | biostudies-literature
| S-EPMC3084207 | biostudies-literature