Dataset Information

Application of one-, three-, and seven-day forecasts during early onset on the COVID-19 epidemic dataset using moving average, autoregressive, autoregressive moving average, autoregressive integrated moving average, and naive forecasting methods.

ABSTRACT: The coronavirus disease 2019 (COVID-19) spread rapidly across the world since its appearance in December 2019. This data set creates one-, three-, and seven-day forecasts of the COVID-19 pandemic's cumulative case counts at the county, health district, and state geographic levels for the state of Virginia. Forecasts are created over the first 46 days of reported COVID-19 cases using the cumulative case count data provided by The New York Times as of April 22, 2020. From this historical data, one-, three-, seven, and all-days prior to the forecast start date are used to generate the forecasts. Forecasts are created using: (1) a Naïve approach; (2) Holt-Winters exponential smoothing (HW); (3) growth rate (Growth); (4) moving average (MA); (5) autoregressive (AR); (6) autoregressive moving average (ARMA); and (7) autoregressive integrated moving average (ARIMA). Median Absolute Error (MdAE) and Median Absolute Percentage Error (MdAPE) metrics are created with each forecast to evaluate the forecast with respect to existing historical data. These error metrics are aggregated to provide a means for assessing which combination of forecast method, forecast length, and lookback length are best fits, based on lowest aggregated error at each geographic level. The data set is comprised of an R-Project file, four R source code files, all 1,329,404 generated short-range forecasts, MdAE and MdAPE error metric data for each forecast, copies of the input files, and the generated comparison tables. All code and data files are provided to provide transparency and facilitate replicability and reproducibility. This package opens directly in RStudio through the R Project file. The R Project file removes the need to set path locations for the folders contained within the data set to simplify setup requirements. This data set provides two avenues for reproducing results: 1) Use the provided code to generate the forecasts from scratch and then run the analyses; or 2) Load the saved forecast data and run the analyses on the stored data. Code annotations provide the instructions needed to accomplish both routes. This data can be used to generate the same set of forecasts and error metrics for any US state by altering the state parameter within the source code. Users can also generate health district forecasts for any other state, by providing a file which maps each county within a state to its respective health-district. The source code can be connected to the most up-to-date version of The New York Times COVID-19 dataset allows for the generation of forecasts up to the most recently reported data to facilitate near real-time forecasting.

SUBMITTER: Lynch CJ

PROVIDER: S-EPMC7834853 | biostudies-literature | 2021 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Application of one-, three-, and seven-day forecasts during early onset on the COVID-19 epidemic dataset using moving average, autoregressive, autoregressive moving average, autoregressive integrated moving average, and naïve forecasting methods.

Lynch Christopher J CJ Gore Ross R

Data in brief 20210115

The coronavirus disease 2019 (COVID-19) spread rapidly across the world since its appearance in December 2019. This data set creates one-, three-, and seven-day forecasts of the COVID-19 pandemic's cumulative case counts at the county, health district, and state geographic levels for the state of Virginia. Forecasts are created over the first 46 days of reported COVID-19 cases using the cumulative case count data provided by <i>The New York Times</i> as of April 22, 2020. From this historical da ...[more]

PMID: 33521186

Similar Datasets

Project description:BackgroundMathematical and statistical models are used to predict trends in epidemic spread and determine the effectiveness of control measures. Automatic regressive integrated moving average (ARIMA) models are used for time-series forecasting, but only few models of the 2019 coronavirus disease (COVID-19) pandemic have incorporated protective behaviors or vaccination, known to be effective for pandemic control.MethodsTo improve the accuracy of prediction, we applied newly developed ARIMA models with predictors (mask wearing, avoiding going out, and vaccination) to forecast weekly COVID-19 case growth rates in Canada, France, Italy, and Israel between January 2021 and March 2022. The open-source data was sourced from the YouGov survey and Our World in Data. Prediction performance was evaluated using the root mean square error (RMSE) and the corrected Akaike information criterion (AICc).ResultsA model with mask wearing and vaccination variables performed best for the pandemic period in which the Alpha and Delta viral variants were predominant (before November 2021). A model using only past case growth rates as autoregressive predictors performed best for the Omicron period (after December 2021). The models suggested that protective behaviors and vaccination are associated with the reduction of COVID-19 case growth rates, with booster vaccine coverage playing a particularly vital role during the Omicron period. For example, each unit increase in mask wearing and avoiding going out significantly reduced the case growth rate during the Alpha/Delta period in Canada (-0.81 and -0.54, respectively; both p < 0.05). In the Omicron period, each unit increase in the number of booster doses resulted in a significant reduction of the case growth rate in Canada (-0.03), Israel (-0.12), Italy (-0.02), and France (-0.03); all p < 0.05.ConclusionsThe key findings of this study are incorporating behavior and vaccination as predictors led to accurate predictions and highlighted their significant role in controlling the pandemic. These models are easily interpretable and can be embedded in a "real-time" schedule with weekly data updates. They can support timely decision making about policies to control dynamically changing epidemics.

Project description:BackgroundMeasuring hepatic R2* by fitting a monoexponential model to the signal decay of a multigradient-echo (mGRE) sequence noninvasively determines hepatic iron content (HIC). Concurrent hepatic steatosis introduces signal oscillations and confounds R2* quantification with standard monoexponential models.PurposeTo evaluate an autoregressive moving average (ARMA) model for accurate quantification of HIC in the presence of fat using biopsy as the reference.Study typePhantom study and in vivo cohort.PopulationTwenty iron-fat phantoms covering clinically relevant R2* (30-800 s-1 ) and fat fraction (FF) ranges (0-40%), and 10 patients (four male, six female, mean age 18.8 years).Field strength/sequence2D mGRE acquisitions at 1.5 T and 3 T.AssessmentPhantoms were scanned at both field strengths. In vivo data were analyzed using the ARMA model to determine R2* and FF values, and compared with biopsy results.Statistical testsLinear regression analysis was used to compare ARMA R2* and FF results with those obtained using a conventional monoexponential model, complex-domain nonlinear least squares (NLSQ) fat-water model, and biopsy.ResultsIn phantoms and in vivo, all models produced R2* and FF values consistent with expected values in low iron and low/high fat conditions. For high iron and no fat phantoms, monoexponential and ARMA models performed excellently (slopes: 0.89-1.07), but NLSQ overestimated R2* (slopes: 1.14-1.36) and produced false FFs (12-17%) at 1.5 T; in high iron and fat phantoms, NLSQ (slopes: 1.02-1.16) outperformed monoexponential and ARMA models (slopes: 1.23-1.88). The results with NLSQ and ARMA improved in phantoms at 3 T (slopes: 0.96-1.04). In patients, mean R2*-HIC estimates for monoexponential and ARMA models were close to biopsy-HIC values (slopes: 0.90-0.95), whereas NLSQ substantially overestimated HIC (slope 1.4) and produced false FF values (4-28%) with very high SDs (15-222%) in patients with high iron overload and no steatosis.Data conclusionARMA is superior in quantifying R2* and FF under high iron and no fat conditions, whereas NLSQ is superior for high iron and concurrent fat at 1.5 T. Both models give improved R2* and FF results at 3 T.Level of evidence2 Technical Efficacy Stage: 2 J. Magn. Reson. Imaging 2019;50:1620-1632.

Project description:Influenza A virus commonly circulating in swine (IAV-S) is characterized by large genetic and antigenic diversity and, thus, improvements in different aspects of IAV-S surveillance are needed to achieve desirable goals of surveillance such as to establish the capacity to forecast with the greatest accuracy the number of influenza cases likely to arise. Advancements in modeling approaches provide the opportunity to use different models for surveillance. However, in order to make improvements in surveillance, it is necessary to assess the predictive ability of such models. This study compares the sensitivity and predictive accuracy of the autoregressive integrated moving average (ARIMA) model, the generalized linear autoregressive moving average (GLARMA) model, and the random forest (RF) model with respect to the frequency of influenza A virus (IAV) in Ontario swine. Diagnostic data on IAV submissions in Ontario swine between 2007 and 2015 were obtained from the Animal Health Laboratory (University of Guelph, Guelph, ON, Canada). Each modeling approach was examined for predictive accuracy, evaluated by the root mean square error, the normalized root mean square error, and the model's ability to anticipate increases and decreases in disease frequency. Likewise, we verified the magnitude of improvement offered by the ARIMA, GLARMA and RF models over a seasonal-naïve method. Using the diagnostic submissions, the occurrence of seasonality and the long-term trend in IAV infections were also investigated. The RF model had the smallest root mean square error in the prospective analysis and tended to predict increases in the number of diagnostic submissions and positive virological submissions at weekly and monthly intervals with a higher degree of sensitivity than the ARIMA and GLARMA models. The number of weekly positive virological submissions is significantly higher in the fall calendar season compared to the summer calendar season. Positive counts at weekly and monthly intervals demonstrated a significant increasing trend. Overall, this study shows that the RF model offers enhanced prediction ability over the ARIMA and GLARMA time series models for predicting the frequency of IAV infections in diagnostic submissions.

Project description:BackgroundInterrupted time series (ITS) analysis is a growing method for assessing intervention impacts on diseases. However, it remains unstudied how the COVID-19 outbreak impacts gonorrhea. This study aimed to evaluate the effect of COVID-19 on gonorrhea and predict gonorrhea epidemics using the ITS-autoregressive integrated moving average (ARIMA) model.MethodsThe number of gonorrhea cases reported in China from January 2005 to September 2022 was collected. Statistical descriptions were applied to indicate the overall epidemiological characteristics of the data, and then the ITS-ARIMA was established. Additionally, we compared the forecasting abilities of ITS-ARIMA with Bayesian structural time series (BSTS), and discussed the model selection process, transfer function, check model fitting, and interpretation of results.ResultDuring 2005-2022, the total cases of gonorrhea were 2,165,048, with an annual average incidence rate of 8.99 per 100,000 people. The highest incidence rate was 14.2 per 100,000 people in 2005 and the lowest was 6.9 per 100,000 people in 2012. The optimal model was ARIMA (0,1, (1,3)) (0,1,1)12 (Akaike's information criterion = 3293.93). When predicting the gonorrhea incidence, the mean absolute percentage error under the ARIMA (16.45%) was smaller than that under the BSTS (22.48%). The study found a 62.4% reduction in gonorrhea during the first-level response, a 46.47% reduction during the second-level response, and an increase of 3.6% during the third-level response. The final model estimated a step change of - 2171 (95% confidence interval [CI] - 3698 to - 644) cases and an impulse change of - 1359 (95% CI - 2381 to - 338) cases. Using the ITS-ARIMA to evaluate the effect of COVID-19 on gonorrhea, the gonorrhea incidence showed a temporary decline before rebounding to pre-COVID-19 levels in China.ConclusionITS analysis is a valuable tool for gauging intervention effectiveness, providing flexibility in modelling various impacts. The ITS-ARIMA model can adeptly explain potential trends, autocorrelation, and seasonality. Gonorrhea, marked by periodicity and seasonality, exhibited a downward trend under the influence of COVID-19 intervention. The ITS-ARIMA outperformed the BSTS, offering superior predictive capabilities for the gonorrhea incidence trend in China.

Dataset Information

Application of one-, three-, and seven-day forecasts during early onset on the COVID-19 epidemic dataset using moving average, autoregressive, autoregressive moving average, autoregressive integrated moving average, and naive forecasting methods.

Publications

Application of one-, three-, and seven-day forecasts during early onset on the COVID-19 epidemic dataset using moving average, autoregressive, autoregressive moving average, autoregressive integrated moving average, and naïve forecasting methods.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets