Project description:Models of electrical activity in cardiac cells have become important research tools as they can provide a quantitative description of detailed and integrative physiology. However, cardiac cell models have many parameters, and how uncertainties in these parameters affect the model output is difficult to assess without undertaking large numbers of model runs. In this study we show that a surrogate statistical model of a cardiac cell model (the Luo-Rudy 1991 model) can be built using Gaussian process (GP) emulators. Using this approach we examined how eight outputs describing the action potential shape and action potential duration restitution depend on six inputs, which we selected to be the maximum conductances in the Luo-Rudy 1991 model. We found that the GP emulators could be fitted to a small number of model runs, and behaved as would be expected based on the underlying physiology that the model represents. We have shown that an emulator approach is a powerful tool for uncertainty and sensitivity analysis in cardiac cell models.
Project description:Biophysically detailed cardiac cell models reconstruct the action potential and calcium dynamics of cardiac myocytes. They aim to capture the biophysics of current flow through ion channels, pumps, and exchangers in the cell membrane, and are highly detailed. However, the relationship between model parameters and model outputs is difficult to establish because the models are both complex and non-linear. The consequences of uncertainty and variability in model parameters are therefore difficult to determine without undertaking large numbers of model evaluations. The aim of the present study was to demonstrate how sensitivity and uncertainty analysis using Gaussian process emulators can be used for a systematic and quantitive analysis of biophysically detailed cardiac cell models. We selected the Courtemanche and Maleckar models of the human atrial action potential for analysis because these models describe a similar set of currents, with different formulations. In our approach Gaussian processes emulate the main features of the action potential and calcium transient. The emulators were trained with a set of design data comprising samples from parameter space and corresponding model outputs, initially obtained from 300 model evaluations. Variance based sensitivity indices were calculated using the emulators, and first order and total effect indices were calculated for each combination of parameter and output. The differences between the first order and total effect indices indicated that the effect of interactions between parameters was small. A second set of emulators were then trained using a new set of design data with a subset of the model parameters with a sensitivity index of more than 0.1 (10%). This second stage analysis enabled comparison of mechanisms in the two models. The second stage sensitivity indices enabled the relationship between the L-type Ca 2+ current and the action potential plateau to be quantified in each model. Our quantitative analysis predicted that changes in maximum conductance of the ultra-rapid K + channel I Kur would have opposite effects on action potential duration in the two models, and this prediction was confirmed by additional simulations. This study has demonstrated that Gaussian process emulators are an effective tool for sensitivity and uncertainty analysis of biophysically detailed cardiac cell models.
Project description:In order to identify and characterize novel human gene expression responses to glucocorticoids, we exposed the human lung adenocarcinoma cell line, A549, to the synthetic glucocorticoid dexamethasone for 1, 3, 5, 7, 9, and 11 hrs in duration as well as to a paired vehicle control, ethanol. We assayed gene expression with RNA-seq and clustered gene expression profiles using an infinite Gaussian process mixture model.
Project description:Manual analysis of human high-resolution colonic manometry data is time consuming, non-standardized and subject to laboratory bias. In this article we present a technique for spectral analysis and statistical inference of quasiperiodic spatiotemporal signals recorded during colonic manometry procedures. Spectral analysis is achieved by computing the continuous wavelet transform and cross-wavelet transform of these signals. Statistical inference is achieved by modeling the resulting time-averaged amplitudes in the frequency and frequency-phase domains as Gaussian processes over a regular grid, under the influence of categorical and numerical predictors specified by the experimental design as a functional mixed-effects model. Parameters of the model are inferred with Hamiltonian Monte Carlo. Using this method, we re-analyzed our previously published colonic manometry data, comparing healthy controls and patients with slow transit constipation. The output from our automated method, supports and adds to our previous manual analysis. To obtain these results took less than two days. In comparison the manual analysis took 5 weeks. The proposed mixed-effects model approach described here can also be used to gain an appreciation of cyclical activity in individual subjects during control periods and in response to any form of intervention.
Project description:MotivationThe location, timing and abundance of gene expression (both mRNA and proteins) within a tissue define the molecular mechanisms of cell functions. Recent technology breakthroughs in spatial molecular profiling, including imaging-based technologies and sequencing-based technologies, have enabled the comprehensive molecular characterization of single cells while preserving their spatial and morphological contexts. This new bioinformatics scenario calls for effective and robust computational methods to identify genes with spatial patterns.ResultsWe represent a novel Bayesian hierarchical model to analyze spatial transcriptomics data, with several unique characteristics. It models the zero-inflated and over-dispersed counts by deploying a zero-inflated negative binomial model that greatly increases model stability and robustness. Besides, the Bayesian inference framework allows us to borrow strength in parameter estimation in a de novo fashion. As a result, the proposed model shows competitive performances in accuracy and robustness over existing methods in both simulation studies and two real data applications.Availability and implementationThe related R/C++ source code is available at https://github.com/Minzhe/BOOST-GP.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:MotivationImproved DNA technology has made it practical to estimate single-nucleotide polymorphism (SNP)-heritability among distantly related individuals with unknown relationships. For growth- and development-related traits, it is meaningful to base SNP-heritability estimation on longitudinal data due to the time-dependency of the process. However, only few statistical methods have been developed so far for estimating dynamic SNP-heritability and quantifying its full uncertainty.ResultsWe introduce a completely tuning-free Bayesian Gaussian process (GP)-based approach for estimating dynamic variance components and heritability as their function. For parameter estimation, we use a modern Markov Chain Monte Carlo method which allows full uncertainty quantification. Several datasets are analysed and our results clearly illustrate that the 95% credible intervals of the proposed joint estimation method (which 'borrows strength' from adjacent time points) are significantly narrower than of a two-stage baseline method that first estimates the variance components at each time point independently and then performs smoothing. We compare the method with a random regression model using MTG2 and BLUPF90 software and quantitative measures indicate superior performance of our method. Results are presented for simulated and real data with up to 1000 time points. Finally, we demonstrate scalability of the proposed method for simulated data with tens of thousands of individuals.Availability and implementationThe C++ implementation dynBGP and simulated data are available in GitHub: https://github.com/aarjas/dynBGP. The programmes can be run in R. Real datasets are available in QTL archive: https://phenome.jax.org/centers/QTLA.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:Gaussian processes (GPs) are common components in Bayesian non-parametric models having a rich methodological literature and strong theoretical grounding. The use of exact GPs in Bayesian models is limited to problems containing several thousand observations due to their prohibitive computational demands. We develop a posterior sampling algorithm using H -matrix approximations that scales at O(nlog2n) . We show that this approximation's Kullback-Leibler divergence to the true posterior can be made arbitrarily small. Though multidimensional GPs could be used with our algorithm, d-dimensional surfaces are modeled as tensor products of univariate GPs to minimize the cost of matrix construction and maximize computational efficiency. We illustrate the performance of this fast increased fidelity approximate GP, FIFA-GP, using both simulated and non-synthetic data sets.
Project description:Methanol production has gained considerable interest on the laboratory and industrial scale as it is a renewable fuel and an excellent hydrogen energy storehouse. The formation of synthesis gas (CO/H2) and the conversion of synthesis gas to methanol are the two basic catalytic processes used in methanol production. Machine learning (ML) approaches have recently emerged as powerful tools in reaction informatics. Inspired by these, we employ Gaussian process regression (GPR) to the model conversion of carbon monoxide (CO) and selectivity of the methanol product using data sets obtained from experimental investigations to capture uncertainty in prediction values. The results indicate that the proposed GPR model can accurately predict CO conversion and methanol selectivity as compared to other ML models. Further, the factors that influence the predictions are identified from the best GPR model employing "Shapley Additive exPlanations" (SHAP). After interpretation, the essential input features are found to be the inlet mole fraction of CO (Y(CO, in)) and the net inlet flow rate (Fin(nL/min)) for our best prediction GPR models, irrespective of our data sets. These interpretable models are employed for Bayesian optimization in a weighted multiobjective framework to obtain the optimal operating points, namely, maximization of both selectivity and conversion.
Project description:MotivationRecent advances in high dimensional phenotyping bring time as an extra dimension into the phenotypes. This promotes the quantitative trait locus (QTL) studies of function-valued traits such as those related to growth and development. Existing approaches for analyzing functional traits utilize either parametric methods or semi-parametric approaches based on splines and wavelets. However, very limited choices of software tools are currently available for practical implementation of functional QTL mapping and variable selection.ResultsWe propose a Bayesian Gaussian process (GP) approach for functional QTL mapping. We use GPs to model the continuously varying coefficients which describe how the effects of molecular markers on the quantitative trait are changing over time. We use an efficient gradient based algorithm to estimate the tuning parameters of GPs. Notably, the GP approach is directly applicable to the incomplete datasets having even larger than 50% missing data rate (among phenotypes). We further develop a stepwise algorithm to search through the model space in terms of genetic variants, and use a minimal increase of Bayesian posterior probability as a stopping rule to focus on only a small set of putative QTL. We also discuss the connection between GP and penalized B-splines and wavelets. On two simulated and three real datasets, our GP approach demonstrates great flexibility for modeling different types of phenotypic trajectories with low computational cost. The proposed model selection approach finds the most likely QTL reliably in tested datasets.Availability and implementationSoftware and simulated data are available as a MATLAB package 'GPQTLmapping', and they can be downloaded from GitHub (https://github.com/jpvanhat/GPQTLmapping). Real datasets used in case studies are publicly available at QTL Archive.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:Transcriptome-wide time series expression profiling is used to characterize the cellular response to environmental perturbations. The first step to analyzing transcriptional response data is often to cluster genes with similar responses. Here, we present a nonparametric model-based method, Dirichlet process Gaussian process mixture model (DPGP), which jointly models data clusters with a Dirichlet process and temporal dependencies with Gaussian processes. We demonstrate the accuracy of DPGP in comparison to state-of-the-art approaches using hundreds of simulated data sets. To further test our method, we apply DPGP to published microarray data from a microbial model organism exposed to stress and to novel RNA-seq data from a human cell line exposed to the glucocorticoid dexamethasone. We validate our clusters by examining local transcription factor binding and histone modifications. Our results demonstrate that jointly modeling cluster number and temporal dependencies can reveal shared regulatory mechanisms. DPGP software is freely available online at https://github.com/PrincetonUniversity/DP_GP_cluster.