Project description:The coronavirus disease 2019 (COVID-19) spread rapidly across the world since its appearance in December 2019. This data set creates one-, three-, and seven-day forecasts of the COVID-19 pandemic's cumulative case counts at the county, health district, and state geographic levels for the state of Virginia. Forecasts are created over the first 46 days of reported COVID-19 cases using the cumulative case count data provided by The New York Times as of April 22, 2020. From this historical data, one-, three-, seven, and all-days prior to the forecast start date are used to generate the forecasts. Forecasts are created using: (1) a Naïve approach; (2) Holt-Winters exponential smoothing (HW); (3) growth rate (Growth); (4) moving average (MA); (5) autoregressive (AR); (6) autoregressive moving average (ARMA); and (7) autoregressive integrated moving average (ARIMA). Median Absolute Error (MdAE) and Median Absolute Percentage Error (MdAPE) metrics are created with each forecast to evaluate the forecast with respect to existing historical data. These error metrics are aggregated to provide a means for assessing which combination of forecast method, forecast length, and lookback length are best fits, based on lowest aggregated error at each geographic level. The data set is comprised of an R-Project file, four R source code files, all 1,329,404 generated short-range forecasts, MdAE and MdAPE error metric data for each forecast, copies of the input files, and the generated comparison tables. All code and data files are provided to provide transparency and facilitate replicability and reproducibility. This package opens directly in RStudio through the R Project file. The R Project file removes the need to set path locations for the folders contained within the data set to simplify setup requirements. This data set provides two avenues for reproducing results: 1) Use the provided code to generate the forecasts from scratch and then run the analyses; or 2) Load the saved forecast data and run the analyses on the stored data. Code annotations provide the instructions needed to accomplish both routes. This data can be used to generate the same set of forecasts and error metrics for any US state by altering the state parameter within the source code. Users can also generate health district forecasts for any other state, by providing a file which maps each county within a state to its respective health-district. The source code can be connected to the most up-to-date version of The New York Times COVID-19 dataset allows for the generation of forecasts up to the most recently reported data to facilitate near real-time forecasting.
Project description:BackgroundAlthough vaccination is one of the main countermeasures against influenza epidemic, it is highly essential to make informed prevention decisions to guarantee that limited vaccination resources are allocated to the places where they are most needed. Hence, one of the fundamental steps for decision making in influenza prevention is to characterize its spatio-temporal trend, especially on the key problem about how influenza transmits among adjacent places and how much impact the influenza of one place could have on its neighbors. To solve this problem while avoiding too much additional time-consuming work on data collection, this study proposed a new concept of spatio-temporal route as well as its estimation methods to construct the influenza transmission network.MethodsThe influenza-like illness (ILI) data of Sichuan province in 21 cities was collected from 2010 to 2016. A joint pattern based on the dynamic Bayesian network (DBN) model and the vector autoregressive moving average (VARMA) model was utilized to estimate the spatio-temporal routes, which were applied to the two stages of learning process respectively, namely structure learning and parameter learning. In structure learning, the first-order conditional dependencies approximation algorithm was used to generate the DBN, which could visualize the spatio-temporal routes of influenza among adjacent cities and infer which cities have impacts on others in influenza transmission. In parameter learning, the VARMA model was adopted to estimate the strength of these impacts. Finally, all the estimated spatio-temporal routes were put together to form the final influenza transmission network.ResultsThe results showed that the period of influenza transmission cycle was longer in Western Sichuan and Chengdu Plain than that in Northeastern Sichuan, and there would be potential spatio-temporal routes of influenza from bordering provinces or municipalities into Sichuan province. Furthermore, this study also pointed out several estimated spatio-temporal routes with relatively high strength of associations, which could serve as clues of hot spot areas detection for influenza surveillance.ConclusionsThis study proposed a new framework for exploring the potentially stable spatio-temporal routes between different places and measuring specific the sizes of transmission effects. It could help making timely and reliable prediction of the spatio-temporal trend of infectious diseases, and further determining the possible key areas of the next epidemic by considering their neighbors' incidence and the transmission relationships.
Project description:ObjectivesProstate cancer is the second most common cause of cancer-related death in males after lung cancer, imposing a significant burden on the healthcare system in Australia. We propose the use of autoregressive integrated moving average (ARIMA) models in conjunction with population forecasts to provide for robust annual projections of prostate cancer.DesignData on the incidence and mortality from prostate cancer was obtained from the Australian Institute of Health and Welfare. We formulated several ARIMA models with different autocorrelation terms and chose one which provided for an accurate fit of the data based on the mean absolute percentage error (MAPE). We also assessed the model for external validity. A similar process was used to model age-standardised incidence and mortality rate for prostate cancer in Australia during the same time period.ResultsThe annual number of prostate cancer cases diagnosed in Australia increased from 3606 in 1982 to 20 065 in 2012. There were two peaks observed around 1994 and 2009. Among the various models evaluated, we found that the model with an autoregressive term of 1 (coefficient=0.45, p=0.028) as well as differencing the series provided the best fit, with a MAPE of 5.2%. External validation showed a good MAPE of 5.8% as well. We project prostate cancer incident cases in 2022 to rise to 25 283 cases (95% CI: 23 233 to 27 333).ConclusionOur study has accurately characterised the trend of prostate cancer incidence and mortality in Australia, and this information will prove useful for resource planning and manpower allocation.
Project description:BackgroundMeasuring hepatic R2* by fitting a monoexponential model to the signal decay of a multigradient-echo (mGRE) sequence noninvasively determines hepatic iron content (HIC). Concurrent hepatic steatosis introduces signal oscillations and confounds R2* quantification with standard monoexponential models.PurposeTo evaluate an autoregressive moving average (ARMA) model for accurate quantification of HIC in the presence of fat using biopsy as the reference.Study typePhantom study and in vivo cohort.PopulationTwenty iron-fat phantoms covering clinically relevant R2* (30-800 s-1 ) and fat fraction (FF) ranges (0-40%), and 10 patients (four male, six female, mean age 18.8 years).Field strength/sequence2D mGRE acquisitions at 1.5 T and 3 T.AssessmentPhantoms were scanned at both field strengths. In vivo data were analyzed using the ARMA model to determine R2* and FF values, and compared with biopsy results.Statistical testsLinear regression analysis was used to compare ARMA R2* and FF results with those obtained using a conventional monoexponential model, complex-domain nonlinear least squares (NLSQ) fat-water model, and biopsy.ResultsIn phantoms and in vivo, all models produced R2* and FF values consistent with expected values in low iron and low/high fat conditions. For high iron and no fat phantoms, monoexponential and ARMA models performed excellently (slopes: 0.89-1.07), but NLSQ overestimated R2* (slopes: 1.14-1.36) and produced false FFs (12-17%) at 1.5 T; in high iron and fat phantoms, NLSQ (slopes: 1.02-1.16) outperformed monoexponential and ARMA models (slopes: 1.23-1.88). The results with NLSQ and ARMA improved in phantoms at 3 T (slopes: 0.96-1.04). In patients, mean R2*-HIC estimates for monoexponential and ARMA models were close to biopsy-HIC values (slopes: 0.90-0.95), whereas NLSQ substantially overestimated HIC (slope 1.4) and produced false FF values (4-28%) with very high SDs (15-222%) in patients with high iron overload and no steatosis.Data conclusionARMA is superior in quantifying R2* and FF under high iron and no fat conditions, whereas NLSQ is superior for high iron and concurrent fat at 1.5 T. Both models give improved R2* and FF results at 3 T.Level of evidence2 Technical Efficacy Stage: 2 J. Magn. Reson. Imaging 2019;50:1620-1632.
Project description:Influenza A virus commonly circulating in swine (IAV-S) is characterized by large genetic and antigenic diversity and, thus, improvements in different aspects of IAV-S surveillance are needed to achieve desirable goals of surveillance such as to establish the capacity to forecast with the greatest accuracy the number of influenza cases likely to arise. Advancements in modeling approaches provide the opportunity to use different models for surveillance. However, in order to make improvements in surveillance, it is necessary to assess the predictive ability of such models. This study compares the sensitivity and predictive accuracy of the autoregressive integrated moving average (ARIMA) model, the generalized linear autoregressive moving average (GLARMA) model, and the random forest (RF) model with respect to the frequency of influenza A virus (IAV) in Ontario swine. Diagnostic data on IAV submissions in Ontario swine between 2007 and 2015 were obtained from the Animal Health Laboratory (University of Guelph, Guelph, ON, Canada). Each modeling approach was examined for predictive accuracy, evaluated by the root mean square error, the normalized root mean square error, and the model's ability to anticipate increases and decreases in disease frequency. Likewise, we verified the magnitude of improvement offered by the ARIMA, GLARMA and RF models over a seasonal-naïve method. Using the diagnostic submissions, the occurrence of seasonality and the long-term trend in IAV infections were also investigated. The RF model had the smallest root mean square error in the prospective analysis and tended to predict increases in the number of diagnostic submissions and positive virological submissions at weekly and monthly intervals with a higher degree of sensitivity than the ARIMA and GLARMA models. The number of weekly positive virological submissions is significantly higher in the fall calendar season compared to the summer calendar season. Positive counts at weekly and monthly intervals demonstrated a significant increasing trend. Overall, this study shows that the RF model offers enhanced prediction ability over the ARIMA and GLARMA time series models for predicting the frequency of IAV infections in diagnostic submissions.
Project description:Many problems in classification involve huge numbers of irrelevant features. Variable selection reveals the crucial features, reduces the dimensionality of feature space, and improves model interpretation. In the support vector machine literature, variable selection is achieved by ℓ1 penalties. These convex relaxations seriously bias parameter estimates toward 0 and tend to admit too many irrelevant features. The current paper presents an alternative that replaces penalties by sparse-set constraints. Penalties still appear, but serve a different purpose. The proximal distance principle takes a loss function L(β) and adds the penalty ρ2dist(β,Sk)2 capturing the squared Euclidean distance of the parameter vector β to the sparsity set Sk where at most k components of β are nonzero. If βρ represents the minimum of the objective fρ(β)=L(β)+ρ2dist(β,Sk)2, then βρ tends to the constrained minimum of L(β) over Sk as ρ tends to ∞. We derive two closely related algorithms to carry out this strategy. Our simulated and real examples vividly demonstrate how the algorithms achieve better sparsity without loss of classification power.
Project description:The Vector AutoRegressive Moving Average (VARMA) model is fundamental to the theory of multivariate time series; however, identifiability issues have led practitioners to abandon it in favor of the simpler but more restrictive Vector AutoRegressive (VAR) model. We narrow this gap with a new optimization-based approach to VARMA identification built upon the principle of parsimony. Among all equivalent data-generating models, we use convex optimization to seek the parameterization that is simplest in a certain sense. A user-specified strongly convex penalty is used to measure model simplicity, and that same penalty is then used to define an estimator that can be efficiently computed. We establish consistency of our estimators in a double-asymptotic regime. Our non-asymptotic error bound analysis accommodates both model specification and parameter estimation steps, a feature that is crucial for studying large-scale VARMA algorithms. Our analysis also provides new results on penalized estimation of infinite-order VAR, and elastic net regression under a singular covariance structure of regressors, which may be of independent interest. We illustrate the advantage of our method over VAR alternatives on three real data examples.
Project description:The Support Vector Machine (SVM) is a very popular classification tool with many successful applications. It was originally designed for binary problems with desirable theoretical properties. Although there exist various Multicategory SVM (MSVM) extensions in the literature, some challenges remain. In particular, most existing MSVMs make use of k classification functions for a k-class problem, and the corresponding optimization problems are typically handled by existing quadratic programming solvers. In this paper, we propose a new group of MSVMs, namely the Reinforced Angle-based MSVMs (RAMSVMs), using an angle-based prediction rule with k - 1 functions directly. We prove that RAMSVMs can enjoy Fisher consistency. Moreover, we show that the RAMSVM can be implemented using the very efficient coordinate descent algorithm on its dual problem. Numerical experiments demonstrate that our method is highly competitive in terms of computational speed, as well as classification prediction performance. Supplemental materials for the article are available online.
Project description:BackgroundR2*-MRI is clinically used to noninvasively assess hepatic iron content (HIC) to guide potential iron chelation therapy. However, coexisting pathologies, such as fibrosis and steatosis, affect R2* measurements and may thus confound HIC estimations.PurposeTo evaluate whether a multispectral auto regressive moving average (ARMA) model can be used in conjunction with quantitative susceptibility mapping (QSM) to measure magnetic susceptibility as a confounder-free predictor of HIC.Study typePhantom study and in vivo cohort.SubjectsNine iron phantoms covering clinically relevant R2* range (20-1200/second) and 48 patients (22 male, 26 female, median age 18 years).Field strength/sequenceThree-dimensional (3D) and two-dimensional (2D) multi-echo gradient echo (GRE) at 1.5 T.AssessmentARMA-QSM modeling was performed on the complex 3D GRE signal to estimate R2*, fat fraction (FF), and susceptibility measurements. R2*-based dry clinical HIC values were calculated from the 2D GRE acquisition using a published R2*-HIC calibration curve as reference standard.Statistical testsLinear regression analysis was performed to compare ARMA R2* and susceptibility-based estimates to iron concentrations and dry clinical HIC values in phantoms and patients, respectively.ResultsIn phantoms, the ARMA R2* and susceptibility values strongly correlated with iron concentrations (R2 ≥ 0.9). In patients, the ARMA R2* values highly correlated (R2 = 0.97) with clinical HIC values with slope = 0.026, and the susceptibility values showed good correlation (R2 = 0.82) with clinical dry HIC values with slope = 3.3 and produced a dry-to-wet HIC ratio of 4.8.Data conclusionThis study shows the feasibility that ARMA-QSM can simultaneously estimate susceptibility-based wet HIC, R2*-based dry HIC and FFs from a single multi-echo GRE acquisition. Our results demonstrate that both, R2* and susceptibility-based wet HIC values estimated with ARMA-QSM showed good association with clinical dry HIC values with slopes similar to published R2*-biopsy HIC calibration and dry-to-wet tissue weight ratio, respectively. Hence, our study shows that ARMA-QSM can provide potentially confounder-free assessment of hepatic iron overload.Level of evidence3 TECHNICAL EFFICACY: Stage 2.
Project description:Objectives: To predict the number of people with diabetes and estimate the economic burden in China. Methods: Data from natural logarithmic transformation of the number of people with diabetes in China from 2000 to 2018 were selected to fit the autoregressive integrated moving average (ARIMA) model, and 2019 data were used to test it. The bottom-up and human capital approaches were chosen to estimate the direct and indirect economic burden of diabetes respectively. Results: The number of people with diabetes in China would increase in the future. The ARIMA model fitted and predicted well. The number of people with diabetes from 2020 to 2025 would be about 94, 96, 97, 98, 99 and 100 m respectively. The economic burden of diabetes from 2019 to 2025 would be about $156b, $160b, $163b, $165b, $167b, $169b and $170b respectively. Conclusion: The situation of diabetes in China is serious. The ARIMA model can be used to predict the number of people with diabetes. We should allocate health resources in a rational manner to improve the prevention and control of diabetes.