Dataset Information

Accurate and Efficient P-value Calculation via Gaussian Approximation: a Novel Monte-Carlo Method.

ABSTRACT: It is of fundamental interest in statistics to test the significance of a set of covariates. For example, in genome-wide association studies, a joint null hypothesis of no genetic effect is tested for a set of multiple genetic variants. The minimum p-value method, higher criticism, and Berk-Jones tests are particularly effective when the covariates with nonzero effects are sparse. However, the correlations among covariates and the non-Gaussian distribution of the response pose a great challenge towards the p-value calculation of the three tests. In practice, permutation is commonly used to obtain accurate p-values, but it is computationally very intensive, especially when we need to conduct a large amount of hypothesis testing. In this paper, we propose a Gaussian approximation method based on a Monte Carlo scheme, which is computationally more efficient than permutation while still achieving similar accuracy. We derive non-asymptotic approximation error bounds that could vanish in the limit even if the number of covariates is much larger than the sample size. Through real-genotype-based simulations and data analysis of a genome-wide association study of Crohn's disease, we compare the accuracy and computation cost of our proposed method, of permutation, and of the method based on asymptotic distribution.

SUBMITTER: Liu Y

PROVIDER: S-EPMC6530914 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Accurate and Efficient <i>P</i>-value Calculation via Gaussian Approximation: a Novel Monte-Carlo Method.

Liu Yaowu Y Xie Jun J

Journal of the American Statistical Association 20180628 525

It is of fundamental interest in statistics to test the significance of a set of covariates. For example, in genome-wide association studies, a joint null hypothesis of no genetic effect is tested for a set of multiple genetic variants. The minimum <i>p</i>-value method, higher criticism, and Berk-Jones tests are particularly effective when the covariates with nonzero effects are sparse. However, the correlations among covariates and the non-Gaussian distribution of the response pose a great cha ...[more]

PMID: 31130762

Similar Datasets

Project description:BackgroundThe expected value of sample information (EVSI) measures the expected benefits that could be obtained by collecting additional data. Estimating EVSI using the traditional nested Monte Carlo method is computationally expensive, but the recently developed Gaussian approximation (GA) approach can efficiently estimate EVSI across different sample sizes. However, the conventional GA may result in biased EVSI estimates if the decision models are highly nonlinear. This bias may lead to suboptimal study designs when GA is used to optimize the value of different studies. Therefore, we extend the conventional GA approach to improve its performance for nonlinear decision models.MethodsOur method provides accurate EVSI estimates by approximating the conditional expectation of the benefit based on 2 steps. First, a Taylor series approximation is applied to estimate the conditional expectation of the benefit as a function of the conditional moments of the parameters of interest using a spline, which is fitted to the samples of the parameters and the corresponding benefits. Next, the conditional moments of parameters are approximated by the conventional GA and Fisher information. The proposed approach is applied to several data collection exercises involving non-Gaussian parameters and nonlinear decision models. Its performance is compared with the nested Monte Carlo method, the conventional GA approach, and the nonparametric regression-based method for EVSI calculation.ResultsThe proposed approach provides accurate EVSI estimates across different sample sizes when the parameters of interest are non-Gaussian and the decision models are nonlinear. The computational cost of the proposed method is similar to that of other novel methods.ConclusionsThe proposed approach can estimate EVSI across sample sizes accurately and efficiently, which may support researchers in determining an economically optimal study design using EVSI.HighlightsThe Gaussian approximation method efficiently estimates the expected value of sample information (EVSI) for clinical trials with varying sample sizes, but it may introduce bias when health economic models have a nonlinear structure.We introduce the spline-based Taylor series approximation method and combine it with the original Gaussian approximation to correct the nonlinearity-induced bias in EVSI estimation.Our approach can provide more precise EVSI estimates for complex decision models without sacrificing computational efficiency, which can enhance the resource allocation strategies from the cost-effective perspective.

Project description:Offshore Probabilistic Tsunami Hazard Assessments (offshore PTHAs) provide large-scale analyses of earthquake-tsunami frequencies and uncertainties in the deep ocean, but do not provide high-resolution onshore tsunami hazard information as required for many risk-management applications. To understand the implications of an offshore PTHA for the onshore hazard at any site, in principle the tsunami inundation should be simulated locally for every earthquake scenario in the offshore PTHA. In practice this is rarely feasible due to the computational expense of inundation models, and the large number of scenarios in offshore PTHAs. Monte Carlo methods offer a practical and rigorous alternative for approximating the onshore hazard, using a random subset of scenarios. The resulting Monte Carlo errors can be quantified and controlled, enabling high-resolution onshore PTHAs to be implemented at a fraction of the computational cost. This study develops efficient Monte Carlo approaches for offshore-to-onshore PTHA. Modelled offshore PTHA wave heights are used to preferentially sample scenarios that have large offshore waves near an onshore site of interest. By appropriately weighting the scenarios, the Monte Carlo errors are reduced without introducing bias. The techniques are demonstrated in a high-resolution onshore PTHA for the island of Tongatapu in Tonga, using the 2018 Australian PTHA as the offshore PTHA, while considering only thrust earthquake sources on the Kermadec-Tonga trench. The efficiency improvements are equivalent to using 4-18 times more random scenarios, as compared with stratified-sampling by magnitude, which is commonly used for onshore PTHA. The greatest efficiency improvements are for rare, large tsunamis, and for calculations that represent epistemic uncertainties in the tsunami hazard. To facilitate the control of Monte Carlo errors in practical applications, this study also provides analytical techniques for estimating the errors both before and after inundation simulations are conducted. Before inundation simulation, this enables a proposed Monte Carlo sampling scheme to be checked, and potentially improved, at minimal computational cost. After inundation simulation, it enables the remaining Monte Carlo errors to be quantified at onshore sites, without additional inundation simulations. In combination these techniques enable offshore PTHAs to be rigorously transformed into onshore PTHAs, with quantification of epistemic uncertainties, while controlling Monte Carlo errors.

Dataset Information

Accurate and Efficient P-value Calculation via Gaussian Approximation: a Novel Monte-Carlo Method.

Publications

Accurate and Efficient <i>P</i>-value Calculation via Gaussian Approximation: a Novel Monte-Carlo Method.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets