Dataset Information

Ps and Qs: Quantization-Aware Pruning for Efficient Low Latency Neural Network Inference.

ABSTRACT: Efficient machine learning implementations optimized for inference in hardware have wide-ranging benefits, depending on the application, from lower inference latency to higher data throughput and reduced energy consumption. Two popular techniques for reducing computation in neural networks are pruning, removing insignificant synapses, and quantization, reducing the precision of the calculations. In this work, we explore the interplay between pruning and quantization during the training of neural networks for ultra low latency applications targeting high energy physics use cases. Techniques developed for this study have potential applications across many other domains. We study various configurations of pruning during quantization-aware training, which we term quantization-aware pruning, and the effect of techniques like regularization, batch normalization, and different pruning schemes on performance, computational complexity, and information content metrics. We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task. Further, quantization-aware pruning typically performs similar to or better in terms of computational efficiency compared to other neural architecture search techniques like Bayesian optimization. Surprisingly, while networks with different training configurations can have similar performance for the benchmark application, the information content in the network can vary significantly, affecting its generalizability.

SUBMITTER: Hawks B

PROVIDER: S-EPMC8299073 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Ps and Qs: Quantization-Aware Pruning for Efficient Low Latency Neural Network Inference.

Hawks Benjamin B Duarte Javier J Fraser Nicholas J NJ Pappalardo Alessandro A Tran Nhan N Umuroglu Yaman Y

Frontiers in artificial intelligence 20210709

Efficient machine learning implementations optimized for inference in hardware have wide-ranging benefits, depending on the application, from lower inference latency to higher data throughput and reduced energy consumption. Two popular techniques for reducing computation in neural networks are pruning, removing insignificant synapses, and quantization, reducing the precision of the calculations. In this work, we explore the interplay between pruning and quantization during the training of neural ...[more]

PMID: 34308339

Similar Datasets

Project description:During the stringent response, bacteria synthesize guanosine-3',5'-bis(diphosphate) (ppGpp) and guanosine-5'-triphosphate 3'-diphosphate (pppGpp), which act as secondary messengers to promote cellular survival and adaptation. (p)ppGpp 'alarmones' are synthesized and/or hydrolyzed by proteins belonging to the RelA/SpoT Homologue (RSH) family. Many bacteria also encode 'small alarmone synthetase' (SAS) proteins (e.g. RelP, RelQ) which may also be capable of synthesizing a third alarmone: guanosine-5'-phosphate 3'-diphosphate (pGpp). Here, we report the biochemical properties of the Rel (RSH), RelP and RelQ proteins from Staphylococcus aureus (Sa-Rel, Sa-RelP, Sa-RelQ, respectively). Sa-Rel synthesized pppGpp more efficiently than ppGpp, but lacked the ability to produce pGpp. Sa-Rel efficiently hydrolyzed all three alarmones in a Mn(II) ion-dependent manner. The removal of the C-terminal regulatory domain of Sa-Rel increased its rate of (p)ppGpp synthesis ca. 10-fold, but had negligible effects on its rate of (pp)pGpp hydrolysis. Sa-RelP and Sa-RelQ efficiently synthesized pGpp in addition to pppGpp and ppGpp. The alarmone-synthesizing abilities of Sa-RelQ, but not Sa-RelP, were allosterically-stimulated by the addition of pppGpp, ppGpp or pGpp. The respective (pp)pGpp-synthesizing activities of Sa-RelP/Sa-RelQ were compared and contrasted with SAS homologues from Enterococcus faecalis (Ef-RelQ) and Streptococcus mutans (Sm-RelQ, Sm-RelP). Results indicated that EF-RelQ, Sm-RelQ and Sa-RelQ were functionally equivalent; but exhibited considerable variations in their respective biochemical properties, and the degrees to which alarmones and single-stranded RNA molecules allosterically modulated their respective alarmone-synthesizing activities. The respective (pp)pGpp-synthesizing capabilities of Sa-RelP and Sm-RelP proteins were inhibited by pGpp, ppGpp and pppGpp. Our results support the premise that RelP and RelQ proteins may synthesize pGpp in addition to (p)ppGpp within S. aureus and other Gram-positive bacterial species.

Project description:Preeclampsia (PE) is a hypertensive complication affecting 8-10% of US pregnancies annually. While there is no cure for PE, aspirin may reduce complications for those at high risk for PE. Furthermore, PE disproportionately affects racial minorities, with a higher burden of morbidity and mortality. Previous studies have shown early prediction of PE would allow for prevention. We approached the prediction of PE using a new method based on a cost-sensitive deep neural network (CSDNN) by considering the severe imbalance and sparse nature of the data, as well as racial disparities. We validated our model using large extant rich data sources that represent a diverse cohort of minority populations in the US. These include Texas Public Use Data Files (PUDF), Oklahoma PUDF, and the Magee Obstetric Medical and Infant (MOMI) databases. We identified the most influential clinical and demographic features (predictor variables) relevant to PE for both general populations and smaller racial groups. We also investigated the effectiveness of multiple network architectures using three hyperparameter optimization algorithms: Bayesian optimization, Hyperband, and random search. Our proposed models equipped with focal loss function yield superior and reliable prediction performance compared with the state-of-the-art techniques with an average area under the curve (AUC) of 66.3% and 63.5% for the Texas and Oklahoma PUDF respectively, while the CSDNN model with weighted cross-entropy loss function outperforms with an AUC of 76.5% for the MOMI data. Furthermore, our CSDNN model equipped with focal loss function leads to an AUC of 66.7% for Texas African American and 57.1% for Native American. The best results are obtained with 62.3% AUC with CSDNN with weighted cross-entropy loss function for Oklahoma African American, 58% AUC with DNN and balanced batch for Oklahoma Native American, and 72.4% AUC using either CSDNN with weighted cross-entropy loss function or CSDNN with focal loss with balanced batch method for MOMI African American dataset. Our results provide the first evidence of the predictive power of clinical databases for PE prediction among minority populations.

Dataset Information

Ps and Qs: Quantization-Aware Pruning for Efficient Low Latency Neural Network Inference.

Publications

Ps and Qs: Quantization-Aware Pruning for Efficient Low Latency Neural Network Inference.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets