Unknown

Dataset Information

0

Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices.


ABSTRACT: While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vanishing weight updates, and decomposition methods to keep the memory overhead low during mini-batch training. Since the weight update has to be transferred to the memristor matrices efficiently, we also investigate the impact of reconstructing the gradient matrixes both internally (rank-seq) and externally (rank-sum) to the memristor array. Our results show that streaming batch principal component analysis (streaming batch PCA) and non-negative matrix factorization (NMF) decomposition algorithms can achieve near MBGD accuracy in a memristor-based multi-layer perceptron trained on the MNIST (Modified National Institute of Standards and Technology) database with only 3 to 10 ranks at significant memory savings. Moreover, NMF rank-seq outperforms streaming batch PCA rank-seq at low-ranks making it more suitable for hardware implementation in future memristor-based accelerators.

SUBMITTER: Zhao J 

PROVIDER: S-EPMC8645649 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

altmetric image

Publications

Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices.

Zhao Junyun J   Huang Siyuan S   Yousuf Osama O   Gao Yutong Y   Hoskins Brian D BD   Adam Gina C GC  

Frontiers in neuroscience 20211122


While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vanishing weight updates, and decomposition methods to keep the memory overhead low during mini-batch training. Since the weight update has to be transferred to the memristor matrices efficiently, we also  ...[more]

Similar Datasets

| S-EPMC7358558 | biostudies-literature
| S-EPMC11267880 | biostudies-literature
| S-EPMC10565802 | biostudies-literature
| S-EPMC10811689 | biostudies-literature
| S-EPMC11244533 | biostudies-literature
| S-EPMC8100137 | biostudies-literature
| S-EPMC7901913 | biostudies-literature
| S-EPMC7010779 | biostudies-literature
| S-EPMC8967462 | biostudies-literature
| S-EPMC11461475 | biostudies-literature