Project description:Motivated by two case studies using primary care records from the Clinical Practice Research Datalink, we describe statistical methods that facilitate the analysis of tall data, with very large numbers of observations. Our focus is on investigating the association between patient characteristics and an outcome of interest, while allowing for variation among general practices. We explore ways to fit mixed-effects models to tall data, including predictors of interest and confounding factors as covariates, and including random intercepts to allow for heterogeneity in outcome among practices. We introduce (1) weighted regression and (2) meta-analysis of estimated regression coefficients from each practice. Both methods reduce the size of the dataset, thus decreasing the time required for statistical analysis. We compare the methods to an existing subsampling approach. All methods give similar point estimates, and weighted regression and meta-analysis give similar standard errors for point estimates to analysis of the entire dataset, but the subsampling method gives larger standard errors. Where all data are discrete, weighted regression is equivalent to fitting the mixed model to the entire dataset. In the presence of a continuous covariate, meta-analysis is useful. Both methods are easy to implement in standard statistical software.
Project description:Considering reducing the airspace congestion and the flight delay simultaneously, this paper formulates the airway network flow assignment (ANFA) problem as a multiobjective optimization model and presents a new multiobjective optimization framework to solve it. Firstly, an effective multi-island parallel evolution algorithm with multiple evolution populations is employed to improve the optimization capability. Secondly, the nondominated sorting genetic algorithm II is applied for each population. In addition, a cooperative coevolution algorithm is adapted to divide the ANFA problem into several low-dimensional biobjective optimization problems which are easier to deal with. Finally, in order to maintain the diversity of solutions and to avoid prematurity, a dynamic adjustment operator based on solution congestion degree is specifically designed for the ANFA problem. Simulation results using the real traffic data from China air route network and daily flight plans demonstrate that the proposed approach can improve the solution quality effectively, showing superiority to the existing approaches such as the multiobjective genetic algorithm, the well-known multiobjective evolutionary algorithm based on decomposition, and a cooperative coevolution multiobjective algorithm as well as other parallel evolution algorithms with different migration topology.
Project description:Large observational databases derived from disease registries and retrospective cohort studies have proven very useful for the study of health services utilization. However, the use of large databases may introduce computational difficulties, particularly when the event of interest is recurrent. In such settings, grouping the recurrent event data into prespecified intervals leads to a flexible event rate model and a data reduction that remedies the computational issues. We propose a possibly stratified marginal proportional rates model with a piecewise-constant baseline event rate for recurrent event data. Both the absence and the presence of a terminal event are considered. Large-sample distributions are derived for the proposed estimators. Simulation studies are conducted under various data configurations, including settings in which the model is misspecified. Guidelines for interval selection are provided and assessed using numerical studies. We then show that the proposed procedures can be carried out using standard statistical software (e.g., SAS, R). An application based on national hospitalization data for end-stage renal disease patients is provided.
Project description:The molecular simulation of chemical reaction equilibrium (CRE) is a challenging and important problem of broad applicability in chemistry and chemical engineering. The primary molecular-based approach for solving this problem has been the reaction ensemble Monte Carlo (REMC) algorithm [Turner et al. Molec. Simulation2008, 34, (2), 119-146], based on classical force-field methodology. In spite of the vast improvements in computer hardware and software since its original development almost 25 years ago, its more widespread application is impeded by its computational inefficiency. A fundamental problem is that its MC basis inhibits the implementation of significant parallelization, and its successful implementation often requires system-specific tailoring and the incorporation of special MC approaches such as replica exchange, expanded ensemble, umbrella sampling, configurational bias, and continuous fractional component methodologies. We describe herein a novel CRE algorithm (reaction ensemble molecular dynamics, ReMD) that exploits modern computer hardware and software capabilities, and which can be straightforwardly implemented for systems of arbitrary size and complexity by exploiting the parallel computing methodology incorporated within many MD software packages (herein, we use GROMACS for illustrative purposes). The ReMD algorithm utilizes these features in the context of a macroscopically inspired and generally applicable free energy minimization approach based on the iterative approximation of the system Gibbs free energy function by a mathematically simple convex ideal solution model using the composition at each iteration as a reference state. Finally, we additionally describe a simple and computationally efficient a posteriori method to estimate the equilibrium concentrations of species present in very small amounts relative to others in the primary calculation. To demonstrate the algorithm, we show its application to two classic example systems considered previously in the literature: the N2-O2-NO system and the ammonia synthesis system.
Project description:In many contexts we may be interested in understanding whether direct connections between agents, such as declared friendships in a classroom or family links in a rural village, affect their outcomes. In this paper, we review the literature studying econometric methods for the analysis of linear models of social effects, a class that includes the 'linear-in-means' local average model, the local aggregate model, and models where network statistics affect outcomes. We provide an overview of the underlying theoretical models, before discussing conditions for identification using observational and experimental/quasi-experimental data.
Project description:BackgroundFlux variability analysis is often used to determine robustness of metabolic models in various simulation conditions. However, its use has been somehow limited by the long computation time compared to other constraint-based modeling methods.ResultsWe present an open source implementation of flux variability analysis called fastFVA. This efficient implementation makes large-scale flux variability analysis feasible and tractable allowing more complex biological questions regarding network flexibility and robustness to be addressed.ConclusionsNetworks involving thousands of biochemical reactions can be analyzed within seconds, greatly expanding the utility of flux variability analysis in systems biology.
Project description:In this paper, we propose a deep convolutional neural network for camera based wildfire detection. We train the neural network via transfer learning and use window based analysis strategy to increase the fire detection rate. To achieve computational efficiency, we calculate frequency response of the kernels in convolutional and dense layers and eliminate those filters with low energy impulse response. Moreover, to reduce the storage for edge devices, we compare the convolutional kernels in Fourier domain and discard similar filters using the cosine similarity measure in the frequency domain. We test the performance of the neural network with a variety of wildfire video clips and the pruned system performs as good as the regular network in daytime wild fire detection, and it also works well on some night wild fire video clips.
Project description:Metal-organic frameworks (MOFs), characterized by dynamic metal-ligand coordination bonding, have pivotal roles in catalysis, gas storage, and separation processes, owing to their open metal sites (OMSs). These sites, however, are frequently occupied by Lewis-base solvent molecules, necessitating activation to expose the OMSs for practical applications. Traditional thermal activation methods involve harsh conditions, risking structural integrity. This study presents a novel 'gas-flow activation' technique using inert gases like nitrogen and argon to eliminate these coordinating solvent molecules at low temperatures, thereby maintaining the structural integrity of MOFs. We specifically explored this method with HKUST-1, demonstrating that gas-flow activation at mild temperatures is not only feasible but also superior in efficiency compared to the conventional thermal methods. This approach highlights the potential for safer, more efficient activation processes in MOF applications, making it a valuable addition to the repertoire of MOF activation techniques. This activation function of inert gas flow allows HKUST-1 as a catalyst for the hydrogenation of acetophenone even at room temperature. In addition, it is demonstrated that this 'gas-flow activation' is broadly applicable in other MOFs such as MOF-14 and UTSA-76. Furthermore, the findings reveal that dynamic coordination bonding, the repeating transient dissociation-association of solvent molecules at OMSs, are key mechanisms in facilitating this activation, pointing towards new directions for designing activation strategies that prevent structural damage.
Project description:The recent dramatic progress in machine learning is partially attributed to the availability of high-performant computers and development tools. The accelerated linear algebra (XLA) compiler is one such tool that automatically optimises array operations (mostly fusion to reduce memory operations) and compiles the optimised operations into high-performant programs specific to target computing platforms. Like machine-learning models, numerical models are often expressed in array operations, and thus their performance can be boosted by XLA. This study is the first of its kind to examine the efficiency of XLA for numerical models, and the efficiency is examined stringently by comparing its performance with that of optimal implementations. Two shared-memory computing platforms are examined-the CPU platform and the GPU platform. To obtain optimal implementations, the computing speed and its optimisation are rigorously studied by considering different workloads and the corresponding computer performance. Two simple equations are found to faithfully modell the computing speed of numerical models with very few easily-measureable parameters. Regarding operation optimisation within XLA, results show that models expressed in low-level operations (e.g., slice, concatenation, and arithmetic operations) are successfully fused while high-level operations (e.g., convolution and roll) are not. Regarding compilation within XLA, results show that for the CPU platform of certain computers and certain simple numerical models on the GPU platform, XLA achieves high efficiency (> 80%) for large problems and acceptable efficiency (10%~80%) for medium-size problems-the gap is from the overhead cost of Python. Unsatisfactory performance is found for the CPU platform of other computers (operations are compiled in a non-optimal way) and for high-dimensional complex models for the GPU platform, where each GPU thread in XLA handles 4 (single precision) or 2 (double precision) output elements-hoping to exploit the high-performant instructions that can read/write 4 or 2 floating-point numbers with one instruction. However, these instructions are rarely used in the generated code for complex models and performance is negatively affected. Therefore, flags should be added to control the compilation for these non-optimal scenarios.
Project description:MotivationMathematical models are nowadays important tools for analyzing dynamics of cellular processes. The unknown model parameters are usually estimated from experimental data. These data often only provide information about the relative changes between conditions, hence, the observables contain scaling parameters. The unknown scaling parameters and corresponding noise parameters have to be inferred along with the dynamic parameters. The nuisance parameters often increase the dimensionality of the estimation problem substantially and cause convergence problems.ResultsIn this manuscript, we propose a hierarchical optimization approach for estimating the parameters for ordinary differential equation (ODE) models from relative data. Our approach restructures the optimization problem into an inner and outer subproblem. These subproblems possess lower dimensions than the original optimization problem, and the inner problem can be solved analytically. We evaluated accuracy, robustness and computational efficiency of the hierarchical approach by studying three signaling pathways. The proposed approach achieved better convergence than the standard approach and required a lower computation time. As the hierarchical optimization approach is widely applicable, it provides a powerful alternative to established approaches.Availability and implementationThe code is included in the MATLAB toolbox PESTO which is available at http://github.com/ICB-DCM/PESTO.Supplementary informationSupplementary data are available at Bioinformatics online.