Dataset Information

Shaping the learning landscape in neural networks around wide flat minima.

ABSTRACT: Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points and such minimizers are often satisfactory at avoiding overfitting. How these 2 features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far-reaching open question. In this paper we study basic nonconvex 1- and 2-layer neural network models that learn random patterns and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy-driven greedy and message-passing algorithms that focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian, and their generalization performance on real data.

SUBMITTER: Baldassi C

PROVIDER: S-EPMC6955380 | biostudies-literature | 2020 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Shaping the learning landscape in neural networks around wide flat minima.

Baldassi Carlo C Pittorino Fabrizio F Zecchina Riccardo R

Proceedings of the National Academy of Sciences of the United States of America 20191223 1

Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points and such minimizers are often satisfactory at avoiding overfitting. How these 2 features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far-reaching open questio ...[more]

PMID: 31871189

Dataset Information

Shaping the learning landscape in neural networks around wide flat minima.

Publications

Shaping the learning landscape in neural networks around wide flat minima.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Learning the local landscape of protein structures with convolutional neural networks.
| S-EPMC8603988 | biostudies-literature

Shaping embodied neural networks for adaptive goal-directed behavior.
| S-EPMC2265558 | biostudies-literature

Nonequilibrium landscape theory of neural networks.
| S-EPMC3831465 | biostudies-literature

How synchronized human networks escape local minima.
| S-EPMC11519520 | biostudies-literature

Shaping Early Reorganization of Neural Networks Promotes Motor Function after Stroke.
| S-EPMC4869817 | biostudies-literature

Interpreting wide-band neural activity using convolutional neural networks.
| S-EPMC8328518 | biostudies-literature

Flat-Top Line-Shaped Beam Shaping and System Design.
| S-EPMC9185535 | biostudies-literature

The inverse variance-flatness relation in stochastic gradient descent is critical for finding flat minima.
| S-EPMC7936325 | biostudies-literature

Learning cellular morphology with neural networks.
| S-EPMC6588634 | biostudies-literature

A mean field view of the landscape of two-layer neural networks.
| S-EPMC6099898 | biostudies-other