Project description:Aggressive overlapping of stochastic activities during phases of vaccine development has been critical to making effective vaccines for COVID-19 available to the public, at "pandemic" speed. In cyclical projects wherein activities can be overlapped, downstream tasks may need rework on account of having commenced prior to receiving requisite information that is only available upon completion of upstream task(s). We provide a framework to understand the interplay between stochastic overlap duration and rework due to overlap, and its impact on minimizing expected completion time for a cyclical project. We motivate the problem using the new paradigm for planning vaccine development projects. It best exemplifies features and scenarios in our model that were not considered and are also not apparent in the examples for cyclical development projects in the literature focused on engineered and manufactured products. We find that planning overlapping in scenarios that may be deemed ineffective with an assumption of deterministic tasks, can actually be beneficial when analyzed using stochastic task duration. We determine optimal planned start times for stochastic tasks as a function of a parameter that proxies for the extent of net gain/loss from overlap to minimize expected completion time for the project. We show that in situations with a net gain from overlap it is optimal to start the downstream task concurrently unless the downstream task does not stochastically dominate the upstream task and the net gain from overlap is not low enough. However, in situations with a net loss from overlap it is always optimal to have some degree of overlap in a stochastic task environment. We find that project rescheduling flexibility is always beneficial in a scenario with net loss from overlap and only beneficial in a scenario with net gain from overlap when the downstream task does not stochastically dominate the upstream task and the net gain from overlap is high enough. Our results on overlapping in 1-to-1, 1-to-n, and n-to-1 stochastic task configurations guide the development of an effective heuristic. Our heuristic offers good solution quality and is scalable to large networks as its computational complexity is linear in the number of tasks.
Project description:The COVID-19 pandemic has had worldwide devastating effects on human lives, highlighting the need for tools to predict its development. The dynamics of such public-health threats can often be efficiently analyzed through simple models that help to make quantitative timely policy decisions. We benchmark a minimal version of a Susceptible-Infected-Removed model for infectious diseases (SIR) coupled with a simple least-squares Statistical Heuristic Regression (SHR) based on a lognormal distribution. We derive the three free parameters for both models in several cases and test them against the amount of data needed to bring accuracy in predictions. The SHR model is ≈ ±2% accurate about 20 days past the second inflexion point in the daily curve of cases, while the SIR model reaches a similar accuracy a fortnight before. All the analyzed cases assert the utility of SHR and SIR approximants as a valuable tool to forecast the disease's evolution. Finally, we have studied simulated stochastic individual-based SIR dynamics, which yields a detailed spatial and temporal view of the disease that cannot be given by SIR or SHR methods.
Project description:Newly emerging pandemics like COVID-19 call for predictive models to implement precisely tuned responses to limit their deep impact on society. Standard epidemic models provide a theoretically well-founded dynamical description of disease incidence. For COVID-19 with infectiousness peaking before and at symptom onset, the SEIR model explains the hidden build-up of exposed individuals which creates challenges for containment strategies. However, spatial heterogeneity raises questions about the adequacy of modeling epidemic outbreaks on the level of a whole country. Here, we show that by applying sequential data assimilation to the stochastic SEIR epidemic model, we can capture the dynamic behavior of outbreaks on a regional level. Regional modeling, with relatively low numbers of infected and demographic noise, accounts for both spatial heterogeneity and stochasticity. Based on adapted models, short-term predictions can be achieved. Thus, with the help of these sequential data assimilation methods, more realistic epidemic models are within reach.
Project description:Fitting Susceptible-Infected-Recovered (SIR) models to incidence data is problematic when not all infected individuals are reported. Assuming an underlying SIR model with general but known distribution for the time to recovery, this paper derives the implied differential-integral equations for observed incidence data when a fixed fraction of newly infected individuals are not observed. The parameters of the resulting system of differential equations are identifiable. Using these differential equations, we develop a stochastic model for the conditional distribution of current disease incidence given the entire past history of reported cases. We estimate the model parameters using Bayesian Markov Chain Monte-Carlo sampling of the posterior distribution. We use our model to estimate the transmission rate and fraction of asymptomatic individuals for the current Coronavirus 2019 outbreak in eight American Countries: the United States of America, Brazil, Mexico, Argentina, Chile, Colombia, Peru, and Panama, from January 2020 to May 2021. Our analysis reveals that the fraction of reported cases varies across all countries. For example, the reported incidence fraction for the United States of America varies from 0.3 to 0.6, while for Brazil it varies from 0.2 to 0.4.
Project description:False negative rates of severe acute respiratory coronavirus 2 diagnostic tests, together with selection bias due to prioritized testing can result in inaccurate modeling of COVID-19 transmission dynamics based on reported "case" counts. We propose an extension of the widely used Susceptible-Exposed-Infected-Removed (SEIR) model that accounts for misclassification error and selection bias, and derive an analytic expression for the basic reproduction number R0 as a function of false negative rates of the diagnostic tests and selection probabilities for getting tested. Analyzing data from the first two waves of the pandemic in India, we show that correcting for misclassification and selection leads to more accurate prediction in a test sample. We provide estimates of undetected infections and deaths between April 1, 2020 and August 31, 2021. At the end of the first wave in India, the estimated under-reporting factor for cases was at 11.1 (95% CI: 10.7,11.5) and for deaths at 3.58 (95% CI: 3.5,3.66) as of February 1, 2021, while they change to 19.2 (95% CI: 17.9, 19.9) and 4.55 (95% CI: 4.32, 4.68) as of July 1, 2021. Equivalently, 9.0% (95% CI: 8.7%, 9.3%) and 5.2% (95% CI: 5.0%, 5.6%) of total estimated infections were reported on these two dates, while 27.9% (95% CI: 27.3%, 28.6%) and 22% (95% CI: 21.4%, 23.1%) of estimated total deaths were reported. Extensive simulation studies demonstrate the effect of misclassification and selection on estimation of R0 and prediction of future infections. A R-package SEIRfansy is developed for broader dissemination.
Project description:Ongoing outbreak of pneumonia caused by novel coronavirus (2019-nCoV) began in December 2019 in Wuhan, China, and the number of new patients continues to increase. Even though it began to spread to many other parts of the world, such as other Asian countries, the Americas, Europe, and the Middle East, the impact of secondary outbreaks caused by exported cases outside China remains unclear. We conducted simulations to estimate the impact of potential secondary outbreaks in a community outside China. Simulations using stochastic SEIR model were conducted, assuming one patient was imported to a community. Among 45 possible scenarios we prepared, the worst scenario resulted in the total number of persons recovered or removed to be 997 (95% CrI 990-1000) at day 100 and a maximum number of symptomatic infectious patients per day of 335 (95% CrI 232-478). Calculated mean basic reproductive number (R0) was 6.5 (Interquartile range, IQR 5.6-7.2). However, better case scenarios with different parameters led to no secondary cases. Altering parameters, especially time to hospital visit. could change the impact of a secondary outbreak. With these multiple scenarios with different parameters, healthcare professionals might be able to better prepare for this viral infection.
Project description:Truncation is a statistical phenomenon that occurs in many time-to-event studies. For example, autopsy-confirmed studies of neurodegenerative diseases are subject to an inherent left and right truncation, also known as double truncation. When the goal is to study the effect of risk factors on survival, the standard Cox regression model cannot be used when the survival time is subject to truncation. Existing methods that adjust for both left and right truncation in the Cox regression model require independence between the survival times and truncation times, which may not be a reasonable assumption in practice. We propose an expectation-maximization algorithm to relax the independence assumption in the Cox regression model under left, right, or double truncation to an assumption of conditional independence on the observed covariates. The resulting regression coefficient estimators are consistent and asymptotically normal. We demonstrate through extensive simulations that the proposed estimator has little bias and has a similar or lower mean-squared error compared to existing estimators. We implement our approach to assess the effect of occupation on survival in subjects with autopsy-confirmed Alzheimer's disease.
Project description:The false negative rate of the diagnostic RT-PCR test for SARS-CoV-2 has been reported to be substantially high. Due to limited availability of testing, only a non-random subset of the population can get tested. Hence, the reported test counts are subject to a large degree of selection bias. We consider an extension of the Susceptible-Exposed-Infected-Removed (SEIR) model under both selection bias and misclassification. We derive closed form expression for the basic reproduction number under such data anomalies using the next generation matrix method. We conduct extensive simulation studies to quantify the effect of misclassification and selection on the resultant estimation and prediction of future case counts. Finally we apply the methods to reported case-death-recovery count data from India, a nation with more than 5 million cases reported over the last seven months. We show that correcting for misclassification and selection can lead to more accurate prediction of case-counts (and death counts) using the observed data as a beta tester. The model also provides an estimate of undetected infections and thus an under-reporting factor. For India, the estimated under-reporting factor for cases is around 21 and for deaths is around 6. We develop an R-package (SEIRfansy) for broader dissemination of the methods.
Project description:We consider estimating multi-task quantile regression under the transnormal model, with focus on high-dimensional setting. We derive a surprisingly simple closed-form solution through rank-based covariance regularization. In particular, we propose the rank-based ℓ1 penalization with positive definite constraints for estimating sparse covariance matrices, and the rank-based banded Cholesky decomposition regularization for estimating banded precision matrices. By taking advantage of alternating direction method of multipliers, nearest correlation matrix projection is introduced that inherits sampling properties of the unprojected one. Our work combines strengths of quantile regression and rank-based covariance regularization to simultaneously deal with nonlinearity and nonnormality for high-dimensional regression. Furthermore, the proposed method strikes a good balance between robustness and efficiency, achieves the "oracle"-like convergence rate, and provides the provable prediction interval under the high-dimensional setting. The finite-sample performance of the proposed method is also examined. The performance of our proposed rank-based method is demonstrated in a real application to analyze the protein mass spectroscopy data.
Project description:Describing the material flow stress and the associated uncertainty is essential for the plastic stochastic structural analysis. In this context, a data-driven approach-heteroscedastic sparse Gaussian process regression (HSGPR) with enhanced efficiency is introduced to model the material flow stress. Different from other machine learning approaches, e.g. artificial neural network (ANN), which only estimate the deterministic flow stress, the HSGPR model can capture the flow stress and its uncertainty simultaneously from the dataset. For validating the proposed model, the experimental data of the Al 6061 alloy is used here. Without setting a priori assumption on the mathematical expression, the proposed HSGPR-based flow stress model can produce a better prediction of the experimental stress data than the ANN model, the conventional GPR model, and Johnson Cook model at elevated temperatures. After the HSGPR-based flow stress model is implemented into finite element analysis, two numerical examples with synthetic material properties are performed to demonstrate the model's capability in stochastic plastic structural analysis. The results have shown that with sufficient data, the distribution of the structural load carrying capacity at elevated temperatures and the variation of load-displacement curves during the loading and unloading processes can be accurately predicted by the HSGPR-based flow stress model.