Dataset Information

Assessing parameter identifiability in phylogenetic models using data cloning.

ABSTRACT: The success of model-based methods in phylogenetics has motivated much research aimed at generating new, biologically informative models. This new computer-intensive approach to phylogenetics demands validation studies and sound measures of performance. To date there has been little practical guidance available as to when and why the parameters in a particular model can be identified reliably. Here, we illustrate how Data Cloning (DC), a recently developed methodology to compute the maximum likelihood estimates along with their asymptotic variance, can be used to diagnose structural parameter nonidentifiability (NI) and distinguish it from other parameter estimability problems, including when parameters are structurally identifiable, but are not estimable in a given data set (INE), and when parameters are identifiable, and estimable, but only weakly so (WE). The application of the DC theorem uses well-known and widely used Bayesian computational techniques. With the DC approach, practitioners can use Bayesian phylogenetics software to diagnose nonidentifiability. Theoreticians and practitioners alike now have a powerful, yet simple tool to detect nonidentifiability while investigating complex modeling scenarios, where getting closed-form expressions in a probabilistic study is complicated. Furthermore, here we also show how DC can be used as a tool to examine and eliminate the influence of the priors, in particular if the process of prior elicitation is not straightforward. Finally, when applied to phylogenetic inference, DC can be used to study at least two important statistical questions: assessing identifiability of discrete parameters, like the tree topology, and developing efficient sampling methods for computationally expensive posterior densities.

SUBMITTER: Ponciano JM

PROVIDER: S-EPMC3478565 | biostudies-literature | 2012 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Assessing parameter identifiability in phylogenetic models using data cloning.

Ponciano José Miguel JM Burleigh J Gordon JG Braun Edward L EL Taper Mark L ML

Systematic biology 20120530 6

The success of model-based methods in phylogenetics has motivated much research aimed at generating new, biologically informative models. This new computer-intensive approach to phylogenetics demands validation studies and sound measures of performance. To date there has been little practical guidance available as to when and why the parameters in a particular model can be identified reliably. Here, we illustrate how Data Cloning (DC), a recently developed methodology to compute the maximum like ...[more]

PMID: 22649181

Similar Datasets

Project description:BackgroundMathematical modeling is now frequently used in outbreak investigations to understand underlying mechanisms of infectious disease dynamics, assess patterns in epidemiological data, and forecast the trajectory of epidemics. However, the successful application of mathematical models to guide public health interventions lies in the ability to reliably estimate model parameters and their corresponding uncertainty. Here, we present and illustrate a simple computational method for assessing parameter identifiability in compartmental epidemic models.MethodsWe describe a parametric bootstrap approach to generate simulated data from dynamical systems to quantify parameter uncertainty and identifiability. We calculate confidence intervals and mean squared error of estimated parameter distributions to assess parameter identifiability. To demonstrate this approach, we begin with a low-complexity SEIR model and work through examples of increasingly more complex compartmental models that correspond with applications to pandemic influenza, Ebola, and Zika.ResultsOverall, parameter identifiability issues are more likely to arise with more complex models (based on number of equations/states and parameters). As the number of parameters being jointly estimated increases, the uncertainty surrounding estimated parameters tends to increase, on average, as well. We found that, in most cases, R0 is often robust to parameter identifiability issues affecting individual parameters in the model. Despite large confidence intervals and higher mean squared error of other individual model parameters, R0 can still be estimated with precision and accuracy.ConclusionsBecause public health policies can be influenced by results of mathematical modeling studies, it is important to conduct parameter identifiability analyses prior to fitting the models to available data and to report parameter estimates with quantified uncertainty. The method described is helpful in these regards and enhances the essential toolkit for conducting model-based inferences using compartmental dynamic models.

Project description:BackgroundHeidenreich et al. (Risk Anal 1997 17 391-399) considered parameter identifiability in the context of the two-mutation cancer model and demonstrated that combinations of all but two of the model parameters are identifiable. We consider the problem of identifiability in the recently developed carcinogenesis models of Little and Wright (Math Biosci 2003 183 111-134) and Little et al. (J Theoret Biol 2008 254 229-238). These models, which incorporate genomic instability, generalize a large number of other quasi-biological cancer models, in particular those of Armitage and Doll (Br J Cancer 1954 8 1-12), the two-mutation model (Moolgavkar et al. Math Biosci 1979 47 55-77), the generalized multistage model of Little (Biometrics 1995 51 1278-1291), and a recently developed cancer model of Nowak et al. (PNAS 2002 99 16226-16231).Methodology/principal findingsWe show that in the simpler model proposed by Little and Wright (Math Biosci 2003 183 111-134) the number of identifiable combinations of parameters is at most two less than the number of biological parameters, thereby generalizing previous results of Heidenreich et al. (Risk Anal 1997 17 391-399) for the two-mutation model. For the more general model of Little et al. (J Theoret Biol 2008 254 229-238) the number of identifiable combinations of parameters is at most less than the number of biological parameters, where is the number of destabilization types, thereby also generalizing all these results. Numerical evaluations suggest that these bounds are sharp. We also identify particular combinations of identifiable parameters.Conclusions/significanceWe have shown that the previous results on parameter identifiability can be generalized to much larger classes of quasi-biological carcinogenesis model, and also identify particular combinations of identifiable parameters. These results are of theoretical interest, but also of practical significance to anyone attempting to estimate parameters for this large class of cancer models.

Project description:BackgroundKinetic models of biochemical systems usually consist of ordinary differential equations that have many unknown parameters. Some of these parameters are often practically unidentifiable, that is, their values cannot be uniquely determined from the available data. Possible causes are lack of influence on the measured outputs, interdependence among parameters, and poor data quality. Uncorrelated parameters can be seen as the key tuning knobs of a predictive model. Therefore, before attempting to perform parameter estimation (model calibration) it is important to characterize the subset(s) of identifiable parameters and their interplay. Once this is achieved, it is still necessary to perform parameter estimation, which poses additional challenges.MethodsWe present a methodology that (i) detects high-order relationships among parameters, and (ii) visualizes the results to facilitate further analysis. We use a collinearity index to quantify the correlation between parameters in a group in a computationally efficient way. Then we apply integer optimization to find the largest groups of uncorrelated parameters. We also use the collinearity index to identify small groups of highly correlated parameters. The results files can be visualized using Cytoscape, showing the identifiable and non-identifiable groups of parameters together with the model structure in the same graph.ResultsOur contributions alleviate the difficulties that appear at different stages of the identifiability analysis and parameter estimation process. We show how to combine global optimization and regularization techniques for calibrating medium and large scale biological models with moderate computation times. Then we evaluate the practical identifiability of the estimated parameters using the proposed methodology. The identifiability analysis techniques are implemented as a MATLAB toolbox called VisId, which is freely available as open source from GitHub ( https://github.com/gabora/visid ).ConclusionsOur approach is geared towards scalability. It enables the practical identifiability analysis of dynamic models of large size, and accelerates their calibration. The visualization tool allows modellers to detect parts that are problematic and need refinement or reformulation, and provides experimentalists with information that can be helpful in the design of new experiments.

Dataset Information

Assessing parameter identifiability in phylogenetic models using data cloning.

Publications

Assessing parameter identifiability in phylogenetic models using data cloning.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets