Project description:Mid-study design modifications are becoming increasingly accepted in confirmatory clinical trials, so long as appropriate methods are applied such that error rates are controlled. It is therefore unfortunate that the important case of time-to-event endpoints is not easily handled by the standard theory. We analyze current methods that allow design modifications to be based on the full interim data, i.e., not only the observed event times but also secondary endpoint and safety data from patients who are yet to have an event. We show that the final test statistic may ignore a substantial subset of the observed event times. An alternative test incorporating all event times is found, where a conservative assumption must be made in order to guarantee type I error control. We examine the power of this approach using the example of a clinical trial comparing two cancer therapies.
Project description:Technological advancements in the field of mobile devices and wearable sensors have helped overcome obstacles in the delivery of care, making it possible to deliver behavioral treatments anytime and anywhere. Here, we discuss our work on the design of a mobile health smoking cessation intervention study with the goal of assessing whether reminders, delivered at times of stress, result in a reduction/prevention of stress in the near-term, and whether this effect changes with time in study. Multiple statistical challenges arose in this effort, leading to the development of the stratified micro-randomized trial design. In these designs, each individual is randomized to treatment repeatedly at times determined by predictions of risk. These risk times may be impacted by prior treatment. We describe the statistical challenges and detail how they can be met.
Project description:BACKGROUND/AIMS:Clinical trials for Alzheimer's disease have been aimed primarily at persons who have cognitive symptoms at enrollment. However, researchers are now recognizing that the pathophysiological process of Alzheimer's disease begins years, if not decades, prior to the onset of clinical symptoms. Successful intervention may require intervening early in the disease process. Critical issues arise in designing clinical trials for primary and secondary prevention of Alzheimer's disease including determination of sample sizes and follow-up duration. We address a number of these issues through application of a unifying multistate model for the preclinical course of Alzheimer's disease. A multistate model allows us to specify at which points during the long disease process the intervention exerts its effects. METHODS:We used a nonhomogeneous Markov multistate model for the progression of Alzheimer's disease through preclinical disease states defined by biomarkers, mild cognitive impairment and Alzheimer's disease dementia. We used transition probabilities based on several published cohort studies. Sample size methods were developed that account for factors including the initial preclinical disease state of trial participants, the primary endpoint, age-dependent transition and mortality rates and specifications of which transition rates are the targets of the intervention. RESULTS:We find that Alzheimer's disease prevention trials with a clinical primary endpoint of mild cognitive impairment or Alzheimer's disease dementia will require sample sizes of the order many thousands of individuals with at least 5 years of follow-up, which is larger than most Alzheimer's disease therapeutic trials conducted to date. The reasons for the large trial sizes include the long and variable preclinical period that spans decades, high rates of attrition among elderly populations due to mortality and losses to follow-up and potential selection effects, whereby healthier subjects enroll in prevention trials. A web application is available to perform sample size calculations using the methods reported here. CONCLUSION:Sample sizes based on multistate models can account for the points in the disease process when interventions exert their effects and may lead to more accurate sample size determinations. We will need innovative strategies to help design Alzheimer's disease prevention trials with feasible sample size requirements and durations of follow-up.
Project description:BackgroundThe current CONSORT guidelines for reporting pilot trials do not recommend hypothesis testing of clinical outcomes on the basis that a pilot trial is under-powered to detect such differences and this is the aim of the main trial. It states that primary evaluation should focus on descriptive analysis of feasibility/process outcomes (e.g. recruitment, adherence, treatment fidelity). Whilst the argument for not testing clinical outcomes is justifiable, the same does not necessarily apply to feasibility/process outcomes, where differences may be large and detectable with small samples. Moreover, there remains much ambiguity around sample size for pilot trials.MethodsMany pilot trials adopt a 'traffic light' system for evaluating progression to the main trial determined by a set of criteria set up a priori. We construct a hypothesis testing approach for binary feasibility outcomes focused around this system that tests against being in the RED zone (unacceptable outcome) based on an expectation of being in the GREEN zone (acceptable outcome) and choose the sample size to give high power to reject being in the RED zone if the GREEN zone holds true. Pilot point estimates falling in the RED zone will be statistically non-significant and in the GREEN zone will be significant; the AMBER zone designates potentially acceptable outcome and statistical tests may be significant or non-significant.ResultsFor example, in relation to treatment fidelity, if we assume the upper boundary of the RED zone is 50% and the lower boundary of the GREEN zone is 75% (designating unacceptable and acceptable treatment fidelity, respectively), the sample size required for analysis given 90% power and one-sided 5% alpha would be around n = 34 (intervention group alone). Observed treatment fidelity in the range of 0-17 participants (0-50%) will fall into the RED zone and be statistically non-significant, 18-25 (51-74%) fall into AMBER and may or may not be significant and 26-34 (75-100%) fall into GREEN and will be significant indicating acceptable fidelity.DiscussionIn general, several key process outcomes are assessed for progression to a main trial; a composite approach would require appraising the rules of progression across all these outcomes. This methodology provides a formal framework for hypothesis testing and sample size indication around process outcome evaluation for pilot RCTs.
Project description:Background Cluster randomized trials have been utilized to evaluate the effectiveness of HIV prevention strategies on reducing incidence. Design of such studies must take into account possible correlation of outcomes within randomized units. Purpose To discuss power and sample size considerations for cluster randomized trials of combination HIV prevention, using an HIV prevention study in Botswana as an illustration. Methods We introduce a new agent-based model to simulate the community-level impact of a combination prevention strategy and investigate how correlation structure within a community affects the coefficient of variation - an essential parameter in designing a cluster randomized trial. Results We construct collections of sexual networks and then propagate HIV on them to simulate the disease epidemic. Increasing level of sexual mixing between intervention and standard-of-care (SOC) communities reduces the difference in cumulative incidence in the two sets of communities. Fifteen clusters per arm and 500 incidence cohort members per community provide 95% power to detect the projected difference in cumulative HIV incidence between SOC and intervention communities (3.93% and 2.34%) at the end of the third study year, using a coefficient of variation 0.25. Although available formulas for calculating sample size for cluster randomized trials can be derived by assuming an exchangeable correlation structure within clusters, we show that deviations from this assumption do not generally affect the validity of such formulas. Limitations We construct sexual networks based on data from Likoma Island, Malawi, and base disease progression on longitudinal estimates from an incidence cohort in Botswana and in Durban as well as a household survey in Mochudi, Botswana. Network data from Botswana and larger sample sizes to estimate rates of disease progression would be useful in assessing the robustness of our model results. Conclusion Epidemic modeling plays a critical role in planning and evaluating interventions for prevention. Simulation studies allow us to take into consideration available information on sexual network characteristics, such as mixing within and between communities as well as coverage levels for different prevention modalities in the combination prevention package.
Project description:BackgroundThe group testing method has been proposed for the detection and estimation of genetically modified plants (adventitious presence of unwanted transgenic plants, AP). For binary response variables (presence or absence), group testing is efficient when the prevalence is low, so that estimation, detection, and sample size methods have been developed under the binomial model. However, when the event is rare (low prevalence <0.1), and testing occurs sequentially, inverse (negative) binomial pooled sampling may be preferred.Methodology/principal findingsThis research proposes three sample size procedures (two computational and one analytic) for estimating prevalence using group testing under inverse (negative) binomial sampling. These methods provide the required number of positive pools ([Formula: see text]), given a pool size (k), for estimating the proportion of AP plants using the Dorfman model and inverse (negative) binomial sampling. We give real and simulated examples to show how to apply these methods and the proposed sample-size formula. The Monte Carlo method was used to study the coverage and level of assurance achieved by the proposed sample sizes. An R program to create other scenarios is given in Appendix S2.ConclusionsThe three methods ensure precision in the estimated proportion of AP because they guarantee that the width (W) of the confidence interval (CI) will be equal to, or narrower than, the desired width ([Formula: see text]), with a probability of [Formula: see text]. With the Monte Carlo study we found that the computational Wald procedure (method 2) produces the more precise sample size (with coverage and assurance levels very close to nominal values) and that the samples size based on the Clopper-Pearson CI (method 1) is conservative (overestimates the sample size); the analytic Wald sample size method we developed (method 3) sometimes underestimated the optimum number of pools.
Project description:The stepped wedge cluster randomized trial (SW-CRT) is an increasingly popular design for evaluating health service delivery or policy interventions. An essential consideration of this design is the need to account for both within-period and between-period correlations in sample size calculations. Especially when embedded in health care delivery systems, many SW-CRTs may have subclusters nested in clusters, within which outcomes are collected longitudinally. However, existing sample size methods that account for between-period correlations have not allowed for multiple levels of clustering. We present computationally efficient sample size procedures that properly differentiate within-period and between-period intracluster correlation coefficients in SW-CRTs in the presence of subclusters. We introduce an extended block exchangeable correlation matrix to characterize the complex dependencies of outcomes within clusters. For Gaussian outcomes, we derive a closed-form sample size expression that depends on the correlation structure only through two eigenvalues of the extended block exchangeable correlation structure. For non-Gaussian outcomes, we present a generic sample size algorithm based on linearization and elucidate simplifications under canonical link functions. For example, we show that the approximate sample size formula under a logistic linear mixed model depends on three eigenvalues of the extended block exchangeable correlation matrix. We provide an extension to accommodate unequal cluster sizes and validate the proposed methods via simulations. Finally, we illustrate our methods in two real SW-CRTs with subclusters.
Project description:In microarray data analysis, we are often required to combine several dependent partial test results. To overcome this, many suggestions have been made in previous literature; Tippett's test and Fisher's omnibus test are most popular. Both tests have known null distributions when the partial tests are independent. However, for dependent tests, their (even, asymptotic) null distributions are unknown and additional numerical procedures are required. In this paper, we revisited Stouffer's test based on z-scores and showed its advantage over the two aforementioned methods in the analysis of large-scale microarray data. The combined statistic in Stouffer's test has a normal distribution with mean 0 from the normality of the z-scores. Its variance can be estimated from the scores of genes in the experiment without an additional numerical procedure. We numerically compared the errors of Stouffer's test and the two p-value based methods, Tippett's test and Fisher's omnibus test. We also analyzed our microarray data to find differentially expressed genes by non-genotoxic and genotoxic carcinogen compounds. Both numerical study and the real application showed that Stouffer's test performed better than Tippett's method and Fisher's omnibus method with additional permutation steps.
Project description:In the problem of composite hypothesis testing, identifying the potential uniformly most powerful (UMP) unbiased test is of great interest. Beyond typical hypothesis settings with exponential family, it is usually challenging to prove the existence and further construct such UMP unbiased tests with finite sample size. For example in the COVID-19 pandemic with limited previous assumptions on the treatment for investigation and the standard of care, adaptive clinical trials are appealing due to ethical considerations, and the ability to accommodate uncertainty while conducting the trial. Although several methods have been proposed to control Type I error rates, how to find a more powerful hypothesis testing strategy is still an open question. Motivated by this problem, we propose an automatic framework of constructing test statistics and corresponding critical values via machine learning methods to enhance power in a finite sample. In this article, we particularly illustrate the performance using Deep Neural Networks (DNN) and discuss its advantages. Simulations and two case studies of adaptive designs demonstrate that our method is automatic, general and prespecified to construct statistics with satisfactory power in finite-sample. Supplemental materials are available online including R code and an R shiny app.
Project description:BackgroundPrevious research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data.MethodsData were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire - 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model.ResultsThe results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data.ConclusionIt was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges.