Dataset Information

Risk Prediction Modeling on Family-Based Sequencing Data Using a Random Field Method.

ABSTRACT: Family-based design is one of the most popular designs in genetic studies and has many unique features for risk-prediction research. It is robust against genetic heterogeneity, and the relatedness among family members can be informative for predicting an individual's risk for disease with polygenic and shared environmental components of risk. Despite these strengths, family-based designs have been used infrequently in current risk-prediction studies, and their related statistical methods have not been well developed. In this article, we developed a generalized random field (GRF) method for family-based risk-prediction modeling on sequencing data. In GRF, subjects' phenotypes are viewed as stochastic realizations of a random field in a space, and a subject's phenotype is predicted by adjacent subjects, where adjacencies between subjects are determined by their genetic and within-family similarities. Different from existing methods that adjust for familial correlations, the GRF uses this information to form surrogates to further improve prediction accuracy. It also uses within-family information to capture predictors (e.g., rare mutations) that are homogeneous in families. Through simulations, we have demonstrated that the GRF method attained better performance than an existing method by considering additional information from family members and accounting for genetic heterogeneity. We further provided practical recommendations for designing family-based risk prediction studies. Finally, we illustrated the GRF method with an application to a whole-genome exome data set from the Michigan State University Twin Registry study.

SUBMITTER: Wen Y

PROVIDER: S-EPMC5586386 | biostudies-literature | 2017 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Risk Prediction Modeling on Family-Based Sequencing Data Using a Random Field Method.

Wen Yalu Y Burt Alexandra A Lu Qing Q

Genetics 20170705 1

Family-based design is one of the most popular designs in genetic studies and has many unique features for risk-prediction research. It is robust against genetic heterogeneity, and the relatedness among family members can be informative for predicting an individual's risk for disease with polygenic and shared environmental components of risk. Despite these strengths, family-based designs have been used infrequently in current risk-prediction studies, and their related statistical methods have no ...[more]

PMID: 28679544

Similar Datasets

Project description:The relationship between the human gut microbiota and disease is of increasing scientific interest. Previous investigations have focused on the differences in intestinal bacterial abundance between control and affected groups to identify disease biomarkers. However, different types of intestinal bacteria may have interacting effects and thus be considered biomarker complexes for disease. To investigate this, we aimed to identify a new kind of biomarker for atopic dermatitis using structural equation modeling (SEM). The biomarkers identified were latent variables, which are complex and derived from the abundance data for bacterial marker candidates. Groups of females and males classified as healthy participants [normal control (NC) (female: 321 participants, male: 99 participants)], and patients afflicted with atopic dermatitis only [AS (female: 45 participants, male: 13 participants)], with atopic dermatitis and other diseases [AM (female: 75 participants, male: 34 participants)], and with other diseases but without atopic dermatitis [OD (female: 1,669 participants, male: 866 participants)] were used in this investigation. The candidate bacterial markers were identified by comparing the intestinal microbial community compositions between the NC and AS groups. In females, two latent variables (lv) were identified; for lv1, the associated components (bacterial genera) were Alistipes, Butyricimonas, and Coprobacter, while for lv2, the associated components were Agathobacter, Fusicatenibacter, and Streptococcus. There was a significant difference in the lv2 scores between the groups with atopic dermatitis (AS, AM) and those without (NC, OD), and the genera identified for lv2 are associated with the suppression of inflammatory responses in the body. A logistic regression model to estimate the probability of atopic dermatitis morbidity with lv2 as an explanatory variable had an area under the curve (AUC) score of 0.66 when assessed using receiver operating characteristic (ROC) analysis, and this was higher than that using other logistic regression models. The results indicate that the latent variables, especially lv2, could represent the effects of atopic dermatitis on the intestinal microbiome in females. The latent variables in the SEM could thus be utilized as a new type of biomarker. The advantages identified for the SEM are as follows: (1) it enables the extraction of more sophisticated information when compared with models focused on individual bacteria and (2) it can improve the accuracy of the latent variables used as biomarkers, as the SEM can be expanded.

Dataset Information

Risk Prediction Modeling on Family-Based Sequencing Data Using a Random Field Method.

Publications

Risk Prediction Modeling on Family-Based Sequencing Data Using a Random Field Method.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets