Dataset Information

Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators.

ABSTRACT: Analog in-memory computing-a promising approach for energy-efficient acceleration of deep learning workloads-computes matrix-vector multiplications but only approximately, due to nonidealities that often are non-deterministic or nonlinear. This can adversely impact the achievable inference accuracy. Here, we develop an hardware-aware retraining approach to systematically examine the accuracy of analog in-memory computing across multiple network topologies, and investigate sensitivity and robustness to a broad set of nonidealities. By introducing a realistic crossbar model, we improve significantly on earlier retraining approaches. We show that many larger-scale deep neural networks-including convnets, recurrent networks, and transformers-can in fact be successfully retrained to show iso-accuracy with the floating point implementation. Our results further suggest that nonidealities that add noise to the inputs or outputs, not the weights, have the largest impact on accuracy, and that recurrent networks are particularly robust to all nonidealities.

SUBMITTER: Rasch MJ

PROVIDER: S-EPMC10469175 | biostudies-literature | 2023 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators.

Rasch Malte J MJ Mackin Charles C Le Gallo Manuel M Chen An A Fasoli Andrea A Odermatt Frédéric F Li Ning N Nandakumar S R SR Narayanan Pritish P Tsai Hsinyu H Burr Geoffrey W GW Sebastian Abu A Narayanan Vijay V

Nature communications 20230830 1

Analog in-memory computing-a promising approach for energy-efficient acceleration of deep learning workloads-computes matrix-vector multiplications but only approximately, due to nonidealities that often are non-deterministic or nonlinear. This can adversely impact the achievable inference accuracy. Here, we develop an hardware-aware retraining approach to systematically examine the accuracy of analog in-memory computing across multiple network topologies, and investigate sensitivity and robustn ...[more]

PMID: 37648721

Similar Datasets

Project description:BackgroundSearching for similarities in protein and DNA databases has become a routine procedure in Molecular Biology. The Smith-Waterman algorithm has been available for more than 25 years. It is based on a dynamic programming approach that explores all the possible alignments between two sequences; as a result it returns the optimal local alignment. Unfortunately, the computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. Furthermore, the exponential growth of protein and DNA databases makes the Smith-Waterman algorithm unrealistic for searching similarities in large sets of sequences. For these reasons heuristic approaches such as those implemented in FASTA and BLAST tend to be preferred, allowing faster execution times at the cost of reduced sensitivity. The main motivation of our work is to exploit the huge computational power of commonly available graphic cards, to develop high performance solutions for sequence alignment.ResultsIn this paper we present what we believe is the fastest solution of the exact Smith-Waterman algorithm running on commodity hardware. It is implemented in the recently released CUDA programming environment by NVidia. CUDA allows direct access to the hardware primitives of the last-generation Graphics Processing Units (GPU) G80. Speeds of more than 3.5 GCUPS (Giga Cell Updates Per Second) are achieved on a workstation running two GeForce 8800 GTX. Exhaustive tests have been done to compare our implementation to SSEARCH and BLAST, running on a 3 GHz Intel Pentium IV processor. Our solution was also compared to a recently published GPU implementation and to a Single Instruction Multiple Data (SIMD) solution. These tests show that our implementation performs from 2 to 30 times faster than any other previous attempt available on commodity hardware.ConclusionsThe results show that graphic cards are now sufficiently advanced to be used as efficient hardware accelerators for sequence alignment. Their performance is better than any alternative available on commodity hardware platforms. The solution presented in this paper allows large scale alignments to be performed at low cost, using the exact Smith-Waterman algorithm instead of the largely adopted heuristic approaches.

Dataset Information

Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators.

Publications

Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets