Project description:As a key variance partitioning tool, linear mixed models (LMMs) using genome-based restricted maximum likelihood (GREML) allow both fixed and random effects. Classic LMMs assume independence between random effects, which can be violated, causing bias. Here we introduce a generalized GREML, named CORE GREML, that explicitly estimates the covariance between random effects. Using extensive simulations, we show that CORE GREML outperforms the conventional GREML, providing variance and covariance estimates free from bias due to correlated random effects. Applying CORE GREML to UK Biobank data, we find, for example, that the transcriptome, imputed using genotype data, explains a significant proportion of phenotypic variance for height (0.15, p-value = 1.5e-283), and that these transcriptomic effects correlate with the genomic effects (genome-transcriptome correlation = 0.35, p-value = 1.2e-14). We conclude that the covariance between random effects is a key parameter for estimation, especially when partitioning phenotypic variance by multi-omics layers.
Project description:Understanding the mechanisms of ecological community dynamics and how they could be affected by environmental changes is important. Population dynamic models have well known ecological parameters that describe key characteristics of species such as the effect of environmental noise and demographic variance on the dynamics, the long-term growth rate, and strength of density regulation. These parameters are also central for detecting and understanding changes in communities of species; however, incorporating such vital parameters into models of community dynamics is challenging. In this paper, we demonstrate how generalized linear mixed models specified as intercept-only models with different random effects can be used to fit dynamic species abundance distributions. Each random effect has an ecologically meaningful interpretation either describing general and species-specific responses to environmental stochasticity in time or space, or variation in growth rate and carrying capacity among species. We use simulations to show that the accuracy of the estimation depends on the strength of density regulation in discrete population dynamics. The estimation of different covariance and population dynamic parameters, with corresponding statistical uncertainties, is demonstrated for case studies of fish and bat communities. We find that species heterogeneity is the main factor of spatial and temporal community similarity for both case studies.
Project description:Linear Mixed Effects (LME) models are powerful statistical tools that have been employed in many different real-world applications such as retail data analytics, marketing measurement, and medical research. Statistical inference is often conducted via maximum likelihood estimation with Normality assumptions on the random effects. Nevertheless, for many applications in the retail industry, it is often necessary to consider non-Normal distributions on the random effects when considering the unknown parameters' business interpretations. Motivated by this need, a linear mixed effects model with possibly non-Normal distribution is studied in this research. We propose a general estimating framework based on a saddlepoint approximation (SA) of the probability density function of the dependent variable, which leads to constrained nonlinear optimization problems. The classical LME model with Normality assumption can then be viewed as a special case under the proposed general SA framework. Compared with the existing approach, the proposed method enhances the real-world interpretability of the estimates with satisfactory model fits.
Project description:We derive streamlined mean field variational Bayes algorithms for fitting linear mixed models with crossed random effects. In the most general situation, where the dimensions of the crossed groups are arbitrarily large, streamlining is hindered by lack of sparseness in the underlying least squares system. Because of this fact we also consider a hierarchy of relaxations of the mean field product restriction. The least stringent product restriction delivers a high degree of inferential accuracy. However, this accuracy must be mitigated against its higher storage and computing demands. Faster sparse storage and computing alternatives are also provided, but come with the price of diminished inferential accuracy. This article provides full algorithmic details of three variational inference strategies, presents detailed empirical results on their pros and cons and, thus, guides the users on their choice of variational inference approach depending on the problem size and computing resources.
Project description:Linear mixed-effects model has been widely used in longitudinal data analyses. In practice, the fitting algorithm can fail to converge due to boundary issues of the estimated random-effects covariance matrix G, i.e., being near-singular, non-positive definite, or both. Current available algorithms are not computationally optimal because the condition number of matrix G is unnecessarily increased when the random-effects correlation estimate is not zero. We propose an adaptive fitting (AF) algorithm using an optimal linear transformation of the random-effects design matrix. It is a data-driven adaptive procedure, aiming at reducing subsequent random-effects correlation estimates down to zero in the optimal transformed estimation space. Simulations show that AF significantly improves the convergent properties, especially under small sample size, relative large noise and high correlation settings. One real data for Insulin-like Growth Factor (IGF) protein is used to illustrate the application of this algorithm implemented with software package R (nlme).
Project description:PurposeThe purpose of this study was to estimate the distribution of the true rates of progression (RoP) of visual field (VF) loss.MethodsWe analyzed the progression of mean deviation over time in series of ≥ 10 tests from 3352 eyes (one per patient) from 5 glaucoma clinics, using a novel Bayesian hierarchical Linear Mixed Model (LMM); this modeled the random-effect distribution of RoPs as the sum of 2 independent processes following, respectively, a negative exponential distribution (the "true" distribution of RoPs) and a Gaussian distribution (the "noise"), resulting in a skewed exGaussian distribution. The exGaussian-LMM was compared to a standard Gaussian-LMM using the Watanabe-Akaike Information Criterion (WAIC). The random-effect distributions were compared to the empirical cumulative distribution function (eCDF) of linear regression RoPs using a Kolmogorov-Smirnov test.ResultsThe WAIC indicated a better fit with the exGaussian-LMM (estimate [standard error]: 192174.4 [721.2]) than with the Gaussian-LMM (192595 [697.4], with a difference of 157.2 [22.6]). There was a significant difference between the eCDF and the Gaussian-LMM distribution (P < 0.0001), but not with the exGaussian-LMM distribution (P = 0.108). The estimated mean (95% credible intervals, CIs) "true" RoP (-0.377, 95% CI = -0.396 to -0.359 dB/year) was more negative than the observed mean RoP (-0.283, 95% CI = -0.299 to -0.268 dB/year), indicating a bias likely due to learning in standard LMMs.ConclusionsThe distribution of "true" RoPs can be estimated with an exGaussian-LMM, improving model accuracy.Translational relevanceWe used these results to develop a fast and accurate analytical approximation for sample-size calculations in clinical trials using standard LMMs, which was integrated in a freely available web application.
Project description:This paper presents a Bayesian analysis of linear mixed models for quantile regression using a modified Cholesky decomposition for the covariance matrix of random effects and an asymmetric Laplace distribution for the error distribution. We consider several novel Bayesian shrinkage approaches for both fixed and random effects in a linear mixed quantile model using extended L1 penalties. To improve mixing of the Markov chains, a simple and efficient partially collapsed Gibbs sampling algorithm is developed for posterior inference. We also extend the framework to a Bayesian mixed expectile model and develop a Metropolis-Hastings acceptance-rejection (MHAR) algorithm using proposal densities based on iteratively weighted least squares estimation. The proposed approach is then illustrated via both simulated and real data examples. Results indicate that the proposed approach performs very well in comparison to the other approaches.
Project description:We propose a new class of generalized linear mixed models with Gaussian mixture random effects for clustered data. To overcome the weak identifiability issues, we fit the model using a penalized Expectation Maximization (EM) algorithm, and develop sequential locally restricted likelihood ratio tests to determine the number of components in the Gaussian mixture. Our work is motivated by an application to nationwide kidney transplant center evaluation in the United States, where the patient-level post-surgery outcomes are repeated measures of the care quality of the transplant centers. By taking into account patient-level risk factors and modeling the center effects by a finite Gaussian mixture model, the proposed model provides a convenient framework to study the heterogeneity among the transplant centers and controls the false discovery rate when screening for transplant centers with non-standard performance.
Project description:HIV RNA viral load measures are often subjected to some upper and lower detection limits depending on the quantification assays. Hence, the responses are either left or right censored. Linear (and nonlinear) mixed-effects models (with modifications to accommodate censoring) are routinely used to analyze this type of data and are based on normality assumptions for the random terms. However, those analyses might not provide robust inference when the normality assumptions are questionable. In this article, we develop a Bayesian framework for censored linear (and nonlinear) models replacing the Gaussian assumptions for the random terms with normal/independent (NI) distributions. The NI is an attractive class of symmetric heavy-tailed densities that includes the normal, Student's-t, slash, and the contaminated normal distributions as special cases. The marginal likelihood is tractable (using approximations for nonlinear models) and can be used to develop Bayesian case-deletion influence diagnostics based on the Kullback-Leibler divergence. The newly developed procedures are illustrated with two HIV AIDS studies on viral loads that were initially analyzed using normal (censored) mixed-effects models, as well as simulations.
Project description:In this manuscript we develop a deep learning algorithm to improve estimation of rates of progression and prediction of future patterns of visual field loss in glaucoma. A generalized variational auto-encoder (VAE) was trained to learn a low-dimensional representation of standard automated perimetry (SAP) visual fields using 29,161 fields from 3,832 patients. The VAE was trained on a 90% sample of the data, with randomization at the patient level. Using the remaining 10%, rates of progression and predictions were generated, with comparisons to SAP mean deviation (MD) rates and point-wise (PW) regression predictions, respectively. The longitudinal rate of change through the VAE latent space (e.g., with eight dimensions) detected a significantly higher proportion of progression than MD at two (25% vs. 9%) and four (35% vs 15%) years from baseline. Early on, VAE improved prediction over PW, with significantly smaller mean absolute error in predicting the 4th, 6th and 8th visits from the first three (e.g., visit eight: VAE8: 5.14 dB vs. PW: 8.07 dB; P < 0.001). A deep VAE can be used for assessing both rates and trajectories of progression in glaucoma, with the additional benefit of being a generative technique capable of predicting future patterns of visual field damage.