The probabilistic backbone of data-driven complex networks: an example in climate.
Ontology highlight
ABSTRACT: Complex systems often exhibit long-range correlations so that typical observables show statistical dependence across long distances. These teleconnections have a tremendous impact on the dynamics as they provide channels for information transport across the system and are particularly relevant in forecasting, control, and data-driven modeling of complex systems. These statistical interrelations among the very many degrees of freedom are usually represented by the so-called correlation network, constructed by establishing links between variables (nodes) with pairwise correlations above a given threshold. Here, with the climate system as an example, we revisit correlation networks from a probabilistic perspective and show that they unavoidably include much redundant information, resulting in overfitted probabilistic (Gaussian) models. As an alternative, we propose here the use of more sophisticated probabilistic Bayesian networks, developed by the machine learning community, as a data-driven modeling and prediction tool. Bayesian networks are built from data including only the (pairwise and conditional) dependencies among the variables needed to explain the data (i.e., maximizing the likelihood of the underlying probabilistic Gaussian model). This results in much simpler, sparser, non-redundant, networks still encoding the complex structure of the dataset as revealed by standard complex measures. Moreover, the networks are capable to generalize to new data and constitute a truly probabilistic backbone of the system. When applied to climate data, it is shown that Bayesian networks faithfully reveal the various long-range teleconnections relevant in the dataset, in particular those emerging in El Niño periods.
SUBMITTER: Graafland CE
PROVIDER: S-EPMC7359351 | biostudies-literature | 2020 Jul
REPOSITORIES: biostudies-literature
ACCESS DATA