Unknown

Dataset Information

0

Developing reliable hourly electricity demand data through screening and imputation.


ABSTRACT: Electricity usage (demand) data are used by utilities, governments, and academics to model electric grids for a variety of planning (e.g., capacity expansion and system operation) purposes. The U.S. Energy Information Administration collects hourly demand data from all balancing authorities (BAs) in the contiguous United States. As of September 2019, we find 2.2% of the demand data in their database are missing. Additionally, 0.5% of reported quantities are either negative values or are otherwise identified as outliers. With the goal of attaining non-missing, continuous, and physically plausible demand data to facilitate analysis, we developed a screening process to identify anomalous values. We then applied a Multiple Imputation by Chained Equations (MICE) technique to impute replacements for missing and anomalous values. We conduct cross-validation on the MICE technique by marking subsets of plausible data as missing, and using the remaining data to predict this "missing" data. The mean absolute percentage error of imputed values is 3.5% across all BAs. The cleaned data are published and available open access: https://doi.org/10.5281/zenodo.3690240.

SUBMITTER: Ruggles TH 

PROVIDER: S-EPMC7250876 | biostudies-literature | 2020 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

Developing reliable hourly electricity demand data through screening and imputation.

Ruggles Tyler H TH   Farnham David J DJ   Tong Dan D   Caldeira Ken K  

Scientific data 20200526 1


Electricity usage (demand) data are used by utilities, governments, and academics to model electric grids for a variety of planning (e.g., capacity expansion and system operation) purposes. The U.S. Energy Information Administration collects hourly demand data from all balancing authorities (BAs) in the contiguous United States. As of September 2019, we find 2.2% of the demand data in their database are missing. Additionally, 0.5% of reported quantities are either negative values or are otherwis  ...[more]

Similar Datasets

| S-EPMC8024589 | biostudies-literature
| S-EPMC5927419 | biostudies-literature
| S-EPMC4432654 | biostudies-literature
| S-EPMC8282868 | biostudies-literature
| S-EPMC7331730 | biostudies-literature
| S-EPMC4568275 | biostudies-literature
| S-EPMC9800174 | biostudies-literature
| S-EPMC9130238 | biostudies-literature
| S-EPMC10023814 | biostudies-literature
| S-EPMC5981650 | biostudies-literature