Dataset Information

Differences in Performance among Test Statistics for Assessing Phylogenomic Model Adequacy.

ABSTRACT: Statistical phylogenetic analyses of genomic data depend on models of nucleotide or amino acid substitution. The adequacy of these substitution models can be assessed using a number of test statistics, allowing the model to be rejected when it is found to provide a poor description of the evolutionary process. A potentially valuable use of model-adequacy test statistics is to identify when data sets are likely to produce unreliable phylogenetic estimates, but their differences in performance are rarely explored. We performed a comprehensive simulation study to identify test statistics that are sensitive to some of the most commonly cited sources of phylogenetic estimation error. Our results show that, for many test statistics, traditional thresholds for assessing model adequacy can fail to reject the model when the phylogenetic inferences are inaccurate and imprecise. This is particularly problematic when analysing loci that have few informative sites. We propose new thresholds for assessing substitution model adequacy and demonstrate their effectiveness in analyses of three phylogenomic data sets. These thresholds lead to frequent rejection of the model for loci that yield topological inferences that are imprecise and are likely to be inaccurate. We also propose the use of a summary statistic that provides a practical assessment of overall model adequacy. Our approach offers a promising means of enhancing model choice in genome-scale data sets, potentially leading to improvements in the reliability of phylogenomic inference.

SUBMITTER: Duchene DA

PROVIDER: S-EPMC6007652 | biostudies-literature | 2018 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Differences in Performance among Test Statistics for Assessing Phylogenomic Model Adequacy.

Duchêne David A DA Duchêne Sebastian S Ho Simon Y W SYW

Genome biology and evolution 20180601 6

Statistical phylogenetic analyses of genomic data depend on models of nucleotide or amino acid substitution. The adequacy of these substitution models can be assessed using a number of test statistics, allowing the model to be rejected when it is found to provide a poor description of the evolutionary process. A potentially valuable use of model-adequacy test statistics is to identify when data sets are likely to produce unreliable phylogenetic estimates, but their differences in performance are ...[more]

PMID: 29788113

Similar Datasets

Project description:Previous studies have proposed that model performance statistics from earlier photochemical grid model (PGM) applications can be used to benchmark performance in new PGM applications. A challenge in implementing this approach is that limited information is available on consistently calculated model performance statistics that vary spatially and temporally over the U.S. Here, a consistent set of model performance statistics are calculated by year, season, region, and monitoring network for PM2.5 and its major components using simulations from versions 4.7.1-5.2.1 of the Community Multiscale Air Quality (CMAQ) model for years 2007-2015. The multi-year set of statistics is then used to provide quantitative context for model performance results from the 2015 simulation. Model performance for PM2.5 organic carbon in the 2015 simulation ranked high (i.e., favorable performance) in the multi-year dataset, due to factors including recent improvements in biogenic secondary organic aerosol and atmospheric mixing parameterizations in CMAQ. Model performance statistics for the Northwest region in 2015 ranked low (i.e., unfavorable performance) for many species in comparison to the 2007-2015 dataset. This finding motivated additional investigation that suggests a need for improved speciation of wildfire PM2.5emissions and modeling of boundary layer dynamics near water bodies. Several limitations were identified in the approach of benchmarking new model performance results with previous results. Since performance statistics vary widely by region and season, a simple set of national performance benchmarks (e.g., one or two targets per species and statistic) as proposed previously are inadequate to assess model performance throughout the U.S. Also, trends in model performance statistics for sulfate over the 2007 to 2015 period suggest that model performance for earlier years may not be a useful reference for assessing model performance for recent years in some cases. Comparisons of results from the 2015 base case with results from five sensitivity simulations demonstrated the importance of parameterizations of NH3 surface exchange, organic aerosol volatility and production, and emissions of crustal cations for predicting PM2.5 species concentrations.

Project description:BackgroundModel rejections lie at the heart of systems biology, since they provide conclusive statements: that the corresponding mechanistic assumptions do not serve as valid explanations for the experimental data. Rejections are usually done using e.g. the chi-square test (?2) or the Durbin-Watson test (DW). Analytical formulas for the corresponding distributions rely on assumptions that typically are not fulfilled. This problem is partly alleviated by the usage of bootstrapping, a computationally heavy approach to calculate an empirical distribution. Bootstrapping also allows for a natural extension to estimation of joint distributions, but this feature has so far been little exploited.ResultsWe herein show that simplistic combinations of bootstrapped tests, like the max or min of the individual p-values, give inconsistent, i.e. overly conservative or liberal, results. A new two-dimensional (2D) approach based on parametric bootstrapping, on the other hand, is found both consistent and with a higher power than the individual tests, when tested on static and dynamic examples where the truth is known. In the same examples, the most superior test is a 2D ?2vs?2, where the second ?2-value comes from an additional help model, and its ability to describe bootstraps from the tested model. This superiority is lost if the help model is too simple, or too flexible. If a useful help model is found, the most powerful approach is the bootstrapped log-likelihood ratio (LHR). We show that this is because the LHR is one-dimensional, because the second dimension comes at a cost, and because LHR has retained most of the crucial information in the 2D distribution. These approaches statistically resolve a previously published rejection example for the first time.ConclusionsWe have shown how to, and how not to, combine tests in a bootstrap setting, when the combination is advantageous, and when it is advantageous to include a second model. These results also provide a deeper insight into the original motivation for formulating the LHR, for the more general setting of nonlinear and non-nested models. These insights are valuable in cases when accuracy and power, rather than computational speed, are prioritized.

Project description:Tracking the state of biodiversity over time is critical to successful conservation, but conventional monitoring schemes tend to be insufficient to adequately quantify how species' abundances and distributions are changing. One solution to this issue is to leverage data generated by citizen scientists, who collect vast quantities of data at temporal and spatial scales that cannot be matched by most traditional monitoring methods. However, the quality of citizen science data can vary greatly. In this paper, we develop three metrics (inventory completeness, range completeness, spatial bias) to assess the adequacy of spatial observation data. We explore the adequacy of citizen science data at the species level for Australia's terrestrial native birds and then model these metrics against a suite of seven species traits (threat status, taxonomic uniqueness, body mass, average count, range size, species density, and human population density) to identify predictors of data adequacy. We find that citizen science data adequacy for Australian birds is increasing across two of our metrics (inventory completeness and range completeness), but not spatial bias, which has worsened over time. Relationships between the three metrics and seven traits we modelled were variable, with only two traits having consistently significant relationships across the three metrics. Our results suggest that although citizen science data adequacy has generally increased over time, there are still gaps in the spatial adequacy of citizen science for monitoring many Australian birds. Despite these gaps, citizen science can play an important role in biodiversity monitoring by providing valuable baseline data that may be supplemented by information collected through other methods. We believe the metrics presented here constitute an easily applied approach to assessing the utility of citizen science datasets for biodiversity analyses, allowing researchers to identify and prioritise regions or species with lower data adequacy that will benefit most from targeted monitoring efforts.

Project description:BackgroundTimely and reliable data on causes of death are fundamental for informed decision-making in the health sector as well as public health research. An in-depth understanding of the quality of data from vital statistics (VS) is therefore indispensable for health policymakers and researchers. We propose a summary index to objectively measure the performance of VS systems in generating reliable mortality data and apply it to the comprehensive cause of death database assembled for the Global Burden of Disease (GBD) 2013 Study.MethodsWe created a Vital Statistics Performance Index, a composite of six dimensions of VS strength, each assessed by a separate empirical indicator. The six dimensions include: quality of cause of death reporting, quality of age and sex reporting, internal consistency, completeness of death reporting, level of cause-specific detail, and data availability/timeliness. A simulation procedure was developed to combine indicators into a single index. This index was computed for all country-years of VS in the GBD 2013 cause of death database, yielding annual estimates of overall VS system performance for 148 countries or territories.ResultsThe six dimensions impacted the accuracy of data to varying extents. VS performance declines more steeply with declining simulated completeness than for any other indicator. The amount of detail in the cause list reported has a concave relationship with overall data accuracy, but is an important driver of observed VS performance. Indicators of cause of death data quality and age/sex reporting have more linear relationships with simulated VS performance, but poor cause of death reporting influences observed VS performance more strongly. VS performance is steadily improving at an average rate of 2.10% per year among the 148 countries that have available data, but only 19.0% of global deaths post-2000 occurred in countries with well-performing VS systems.ConclusionsObjective and comparable information about the performance of VS systems and the utility of the data that they report will help to focus efforts to strengthen VS systems. Countries and the global health community alike need better intelligence about the accuracy of VS that are widely and often uncritically used in population health research and monitoring.

Dataset Information

Differences in Performance among Test Statistics for Assessing Phylogenomic Model Adequacy.

Publications

Differences in Performance among Test Statistics for Assessing Phylogenomic Model Adequacy.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets