Project description:Researchers often have informative hypotheses in mind when comparing means across treatment groups, such as H1 : μA < μB < μC and H2 : μB < μA < μC, and want to compare these hypotheses to each other directly. This can be done by means of Bayesian inference. This article discusses the disadvantages of the frequentist approach to null hypothesis testing and the advantages of the Bayesian approach. It demonstrates how to use the Bayesian approach to hypothesis testing in the setting of cluster-randomized trials. The data from a school-based smoking prevention intervention with four treatment groups are used to illustrate the Bayesian approach. The main advantage of the Bayesian approach is that it provides a degree of evidence from the collected data in favor of an informative hypothesis. Furthermore, a simulation study was conducted to investigate how Bayes factors behave with cluster-randomized trials. The results from the simulation study showed that the Bayes factor increases with increasing number of clusters, cluster size, and effect size, and decreases with increasing intraclass correlation coefficient. The effect of the number of clusters is stronger than the effect of cluster size. With a small number of clusters, the effect of increasing cluster size may be weak, especially when the intraclass correlation coefficient is large. In conclusion, the study showed that the Bayes factor is affected by sample size and intraclass correlation similarly to how these parameters affect statistical power in the frequentist approach of null hypothesis significance testing. Bayesian evaluation may be used as an alternative to null hypotheses testing.
Project description:MotivationModern high-throughput biotechnologies such as microarray are capable of producing a massive amount of information for each sample. However, in a typical high-throughput experiment, only limited number of samples were assayed, thus the classical 'large p, small n' problem. On the other hand, rapid propagation of these high-throughput technologies has resulted in a substantial collection of data, often carried out on the same platform and using the same protocol. It is highly desirable to utilize the existing data when performing analysis and inference on a new dataset.ResultsUtilizing existing data can be carried out in a straightforward fashion under the Bayesian framework in which the repository of historical data can be exploited to build informative priors and used in new data analysis. In this work, using microarray data, we investigate the feasibility and effectiveness of deriving informative priors from historical data and using them in the problem of detecting differentially expressed genes. Through simulation and real data analysis, we show that the proposed strategy significantly outperforms existing methods including the popular and state-of-the-art Bayesian hierarchical model-based approaches. Our work illustrates the feasibility and benefits of exploiting the increasingly available genomics big data in statistical inference and presents a promising practical strategy for dealing with the 'large p, small n' problem.Availability and implementationOur method is implemented in R package IPBT, which is freely available from https://github.com/benliemory/IPBT CONTACT: yuzhu@purdue.edu; zhaohui.qin@emory.eduSupplementary informationSupplementary data are available at Bioinformatics online.
Project description:This paper presents three objective Bayesian methods for analyzing bilateral data under Dallal's model and the saturated model. Three parameters are of interest, namely, the risk difference, the risk ratio, and the odds ratio. We derive Jeffreys' prior and Bernardo's reference prior associated with the three parameters that characterize Dallal's model. We derive the functional forms of the posterior distributions of the risk difference and the risk ratio and discuss how to sample from their posterior distributions. We demonstrate the use of the proposed methodology with two real data examples. We also investigate small, moderate, and large sample properties of the proposed methodology and the frequentist counterpart via simulations.
Project description:We discuss inference for a human phage display experiment with three stages. The data are tripeptide counts by tissue and stage. The primary aim of the experiment is to identify ligands that bind with high affinity to a given tissue. We formalize the research question as inference about the monotonicity of mean counts over stages. The inference goal is then to identify a list of peptide-tissue pairs with significant increase over stages. We use a semiparametric Dirichlet process mixture of Poisson model. The posterior distribution under this model allows the desired inference about the monotonicity of mean counts. However, the desired inference summary as a list of peptide-tissue pairs with significant increase involves a massive multiplicity problem. We consider two alternative approaches to address this multiplicity issue. First we propose an approach based on the control of the posterior expected false discovery rate. We notice that the implied solution ignores the relative size of the increase. This motivates a second approach based on a utility function that includes explicit weights for the size of the increase.
Project description:There is growing interest in analysing high-dimensional count data, which often exhibit quasi-sparsity corresponding to an overabundance of zeros and small nonzero counts. Existing methods for analysing multivariate count data via Poisson or negative binomial log-linear hierarchical models with zero-inflation cannot flexibly adapt to quasi-sparse settings. We develop a new class of continuous local-global shrinkage priors tailored to quasi-sparse counts. Theoretical properties are assessed, including flexible posterior concentration and stronger control of false discoveries in multiple testing. Simulation studies demonstrate excellent small-sample properties relative to competing methods. We use the method to detect rare mutational hotspots in exome sequencing data and to identify North American cities most impacted by terrorism.
Project description:Sparse-data problems are common, and approaches are needed to evaluate the sensitivity of parameter estimates based on sparse data. We propose a Bayesian approach that uses weakly informative priors to quantify sensitivity of parameters to sparse data. The weakly informative prior is based on accumulated evidence regarding the expected magnitude of relationships using relative measures of disease association. We illustrate the use of weakly informative priors with an example of the association of lifetime alcohol consumption and head and neck cancer. When data are sparse and the observed information is weak, a weakly informative prior will shrink parameter estimates toward the prior mean. Additionally, the example shows that when data are not sparse and the observed information is not weak, a weakly informative prior is not influential. Advancements in implementation of Markov Chain Monte Carlo simulation make this sensitivity analysis easily accessible to the practicing epidemiologist.
Project description:We propose a spatial Bayesian variable selection method for detecting blood oxygenation level dependent activation in functional magnetic resonance imaging (fMRI) data. Typical fMRI experiments generate large datasets that exhibit complex spatial and temporal dependence. Fitting a full statistical model to such data can be so computationally burdensome that many practitioners resort to fitting oversimplified models, which can lead to lower quality inference. We develop a full statistical model that permits efficient computation. Our approach eases the computational burden in two ways. We partition the brain into 3D parcels, and fit our model to the parcels in parallel. Voxel-level activation within each parcel is modeled as regressions located on a lattice. Regressors represent the magnitude of change in blood oxygenation in response to a stimulus, while a latent indicator for each regressor represents whether the change is zero or non-zero. A sparse spatial generalized linear mixed model captures the spatial dependence among indicator variables within a parcel and for a given stimulus. The sparse SGLMM permits considerably more efficient computation than does the spatial model typically employed in fMRI. Through simulation we show that our parcellation scheme performs well in various realistic scenarios. Importantly, indicator variables on the boundary between parcels do not exhibit edge effects. We conclude by applying our methodology to data from a task-based fMRI experiment.
Project description:Informative priors can be a useful tool for epidemiologists to handle problems of sparse data in regression modeling. It is sometimes the case that an investigator is studying a population exposed to two agents, X and Y, where Y is the agent of primary interest. Previous research may suggest that the exposures have different effects on the health outcome of interest, one being more harmful than the other. Such information may be derived from epidemiologic analyses; however, in the case where such evidence is unavailable, knowledge can be drawn from toxicologic studies or other experimental research. Unfortunately, using toxicologic findings to develop informative priors in epidemiologic analyses requires strong assumptions, with no established method for its utilization. We present a method to help bridge the gap between animal and cellular studies and epidemiologic research by specification of an order-constrained prior. We illustrate this approach using an example from radiation epidemiology.
Project description:Statistical modeling produces compressed and often more easily interpretable descriptions of experimental data in form of model parameters. When experimental manipulations target selected parameters, it is necessary for their interpretation that other model components remain constant. For example, psychophysicists use dose rate models to describe how behavior changes as a function of a single stimulus variable. The main interest is on shifts of this function induced by experimental manipulation, assuming invariance in other aspects of the function. Combining several experimental conditions in a joint analysis that takes such invariance constraints into account can result in a complex model for which no robust standard procedures are available. We formulate a solution for the joint analysis through repeated applications of standard procedures by allowing an additional assumption. This way, experimental conditions can be analyzed separately such that all conditions are implicitly taken into account. We investigate the validity of the supplementary assumption through simulations. Furthermore, we present a natural way to check whether a joint treatment is appropriate. We illustrate the method for the specific case of the psychometric function; however the procedure applies to other models that encompass multiple experimental conditions.
Project description:We consider a set of sample counts obtained by sampling arbitrary fractions of a finite volume containing an homogeneously dispersed population of identical objects. We report a Bayesian derivation of the posterior probability distribution of the population size using a binomial likelihood and non-conjugate, discrete uniform priors under sampling with or without replacement. Our derivation yields a computationally feasible formula that can prove useful in a variety of statistical problems involving absolute quantification under uncertainty. We implemented our algorithm in the R package dupiR and compared it with a previously proposed Bayesian method based on a Gamma prior. As a showcase, we demonstrate that our inference framework can be used to estimate bacterial survival curves from measurements characterized by extremely low or zero counts and rather high sampling fractions. All in all, we provide a versatile, general purpose algorithm to infer population sizes from count data, which can find application in a broad spectrum of biological and physical problems.