Dataset Information

Optimal errors and phase transitions in high-dimensional generalized linear models.

ABSTRACT: Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural networks. We evaluate the mutual information (or "free entropy") from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Nonrigorous predictions for the optimal errors existed for special cases of GLMs, e.g., for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades-old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance and locate the associated sharp phase transitions separating learnable and nonlearnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multipurpose algorithms.

SUBMITTER: Barbier J

PROVIDER: S-EPMC6431156 | biostudies-literature | 2019 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Optimal errors and phase transitions in high-dimensional generalized linear models.

Barbier Jean J Krzakala Florent F Macris Nicolas N Miolane Léo L Zdeborová Lenka L

Proceedings of the National Academy of Sciences of the United States of America 20190301 12

Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural networks. We evaluate the mutual information (or "free entropy") from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where bo ...[more]

PMID: 30824595

Dataset Information

Optimal errors and phase transitions in high-dimensional generalized linear models.

Publications

Optimal errors and phase transitions in high-dimensional generalized linear models.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

LINEAR HYPOTHESIS TESTING FOR HIGH DIMENSIONAL GENERALIZED LINEAR MODELS.
| S-EPMC6750760 | biostudies-literature

Transfer Learning under High-dimensional Generalized Linear Models.
| S-EPMC10982637 | biostudies-literature

Testing generalized linear models with high-dimensional nuisance parameter.
| S-EPMC9933885 | biostudies-literature

A Regularization-Based Adaptive Test for High-Dimensional Generalized Linear Models.
| S-EPMC7425805 | biostudies-literature

Statistical Inference for High-Dimensional Generalized Linear Models with Binary Outcomes.
| S-EPMC10292730 | biostudies-literature

Optimal Estimation of Genetic Relatedness in High-dimensional Linear Models
| S-EPMC10907007 | biostudies-literature

Markov neighborhood regression for statistical inference of high-dimensional generalized linear models.
| S-EPMC9427730 | biostudies-literature

Estimation and Inference for High Dimensional Generalized Linear Models: A Splitting and Smoothing Approach.
| S-EPMC8442657 | biostudies-literature

An Information Matrix Prior for Bayesian Analysis in Generalized Linear Models with High Dimensional Data.
| S-EPMC2909687 | biostudies-literature

Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data.
| S-EPMC9907224 | biostudies-literature