Research on High-Precision Stochastic Computing VLSI Structures for Deep Neural Network Accelerators

doi:10.12142/ZTECOM.202404003

ZTE Communications ›› 2024, Vol. 22 ›› Issue (4): 9-17.DOI: 10.12142/ZTECOM.202404003

• Special Topic • Previous Articles Next Articles

Research on High-Precision Stochastic Computing VLSI Structures for Deep Neural Network Accelerators

WU Jingguo¹, ZHU Jingwei¹, XIONG Xiankui²^,³, YAO Haidong²^,³, WANG Chengchen²^,³, CHEN Yun¹()

^1.Fudan University, Shanghai 200433, China
^2.State Key Laboratory of Mobile Network and Mobile Multimedia Technology, Shenzhen 518055, China
^3.ZTE Corporation, Shenzhen 518057, China

Received:2024-08-04 Online:2024-12-20 Published:2024-12-03
About author:WU Jingguo received his BS degree in microelectronics science and engineering from Xi’an Jiaotong University, China in 2021. He is currently pursuing for a master’s degree at Fudan University, China. His main research interest is stochastic computing.
ZHU Jingwei is a PhD candidate in the School of Microelectronics, Fudan University, China. He received his MASc degree from Fudan University. His current research interests include neuromorphic computing, spiking neural networks and NoC.
XIONG Xiankui graduated from University of Electronic Science and Technology of China. He is the chief architect of the Wireless Department and the leader of the Prospect Team of the Smart Computing Technical Committee, ZTE Corporation. He has been engaged in long-term research on computing systems and architectures, advanced computing paradigms, and heterogeneous computing accelerators. He has led the system architecture design of the ZTE ATCA advanced telecom computing platform, server storage platform, smart NIC, and AI accelerator.
YAO Haidong graduated from the University of Science and Technology of China, and he is a senior expert in the field of wireless communications at ZTE Corporation. He is mainly engaged in the research and design of deep learning, large language model network architecture, and compilation and conversion technology.
WANG Chengchen graduated from the Department of Precision Instrument, Tsinghua University, China and is now working in ZTE Corporation. His research directions include computing in memory, optical calculation, and probability calculation.
CHEN Yun (chenyun@fudan.edu.cn) received her BSc degree from University of Science and Technology of China in 2000, and PhD degree from Fudan University, China in 2007. She joined Fudan University at the same year, where she has been a professor with the State Key Laboratory of ASIC and Systems. She has published more than 60 articles in renowned international journals and conferences and applied for more than 20 patents. Her research interests include baseband processing technologies for wireless communication and ultralow power FEC IC design. Pro. CHEN is a member of the Steering Committee of SIPS and the ASICON Technical Committee. She serves as a TPC Member for ASSCC. She also serves as the Chair Secretary for the Shanghai Chapter of IEEE SSCS, and the Co-Chair for the Circuit System Division, the Chinese Institute of Electronics. She is a senior member of IEEE.

Abstract

Abstract:

Deep neural networks (DNN) are widely used in image recognition, image classification, and other fields. However, as the model size increases, the DNN hardware accelerators face the challenge of higher area overhead and energy consumption. In recent years, stochastic computing (SC) has been considered a way to realize deep neural networks and reduce hardware consumption. A probabilistic compensation algorithm is proposed to solve the accuracy problem of stochastic calculation, and a fully parallel neural network accelerator based on a deterministic method is designed. The software simulation results show that the accuracy of the probability compensation algorithm on the CIFAR-10 data set is 95.32%, which is 14.98% higher than that of the traditional SC algorithm. The accuracy of the deterministic algorithm on the CIFAR-10 dataset is 95.06%, which is 14.72% higher than that of the traditional SC algorithm. The results of Very Large Scale Integration Circuit (VLSI) hardware tests show that the normalized energy efficiency of the fully parallel neural network accelerator based on the deterministic method is improved by 31% compared with the circuit based on binary computing.

Key words: stochastic computing, hardware accelerator, deep neural network

WU Jingguo, ZHU Jingwei, XIONG Xiankui, YAO Haidong, WANG Chengchen, CHEN Yun. Research on High-Precision Stochastic Computing VLSI Structures for Deep Neural Network Accelerators[J]. ZTE Communications, 2024, 22(4): 9-17.

Figures/Tables 11

Table 1 Test results of neural network accelerator application based on stochastic computing

DataSet	Method	Network Accuracy/%
MNIST	Binary	99.55
	Traditional SC	98.56
	$\sqrt[]{x}$ Probability compensation SC	99.50
	Deterministic method SC	99.43
CIFAR-10	Binary	95.52
	Traditional SC	80.34
	$\sqrt[]{x}$ Probability compensation SC	95.32
	Deterministic method SC	95.06

Table 1 Test results of neural network accelerator application based on stochastic computing

DataSet	Method	Network Accuracy/%
MNIST	Binary	99.55
	Traditional SC	98.56
	$\sqrt[]{x}$ Probability compensation SC	99.50
	Deterministic method SC	99.43
CIFAR-10	Binary	95.52
	Traditional SC	80.34
	$\sqrt[]{x}$ Probability compensation SC	95.32
	Deterministic method SC	95.06

Table 2 Layout parameters of the neural network accelerator based on stochastic computing

Method	Type	Clock Frequency/MHz	Area/mm²	Power/mW
Traditional SC	Uncompensated	1 000	0.104	81.3
Fully parallel	Deterministic method	313	0.601	116.7

Table 3 Comparsion of stochastic computing DNN implementations with other very large scale integration circuit (VLSI) deep neural networks (DNN)

Accelerator

Type

Process Node/nm

EER

(GOPS/W)

Normalized EER

(GOPS/W)

References 12

1	ZHANG Z D, WANG R S, ZHANG Z, et al. Circuit reliability comparison between stochastic computing and binary computing [J]. IEEE transactions on circuits and systems II: express briefs, 2020, 67(12): 3342–3346. DOI: 10.1109/tcsii.2020.2993273
2	LI T M, ROMASZKAN W, PAMARTI S, et al. GEO: Generation and execution optimized stochastic computing accelerator for neural networks [C]//Proceedings of 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2021: 689–694. DOI: 10.23919/date51398.2021.9473911
3	HU Y X, ZHANG Y W, WANG R S, et al. A 28-nm 198.9-TOPS/W fault-tolerant stochastic computing neural network processor [J]. IEEE solid-state circuits letters, 2022, 5: 198–201. DOI: 10.1109/lssc.2022.3194954
4	CHEN Z Y, MA Y F, WANG Z F. Hybrid stochastic-binary computing for low-latency and high-precision inference of CNNs [J]. IEEE transactions on circuits and systems I: regular papers, 2022, 69(7): 2707–2720. DOI: 10.1109/tcsi.2022.3166524
5	HU S, HAN K N, HU J H. High performance and hardware efficient stochastic computing elements for deep neural network [C]//Proceedings of 6th World Conference on Computing and Communication Technologies (WCCCT). IEEE, 2023: 181–186. DOI: 10.1109/wccct56755.2023.10052402
6	FRASSER C F, LINARES-SERRANO P, DE LOS RÍOS I D, et al. Fully parallel stochastic computing hardware implementation of convolutional neural networks for edge computing applications [J]. IEEE transactions on neural networks and learning systems, 2023, 34(12): 10408–10418. DOI: 10.1109/tnnls.2022.3166799
7	XIE Z P, YUAN C Y, LI L K, et al. Energy-efficient stochastic computing for convolutional neural networks by using kernel-wise parallelism [C]//Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2023: 1–5. DOI: 10.1109/iscas46773.2023.10181378
8	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]//Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016: 770–778. DOI: 10.1109/cvpr.2016.90
9	LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition [J]. Proceedings of the IEEE, 1998, 86(11): 2278–2324. DOI: 10.1109/5.726791
10	KRIZHEVSKY A. Learning multiple layers of features from tiny images [EB/OL]. (2012-05-20) [2024-08-01].
11	NISHI Y, DOERONG R. Handbook of semiconductor manufacturing technology [M]. Boca Raton, USA: CRC Press, 2008
12	CHEN Y H, YANG T J, EMER J S, et al. Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices [J]. IEEE journal on emerging and selected topics in circuits and systems, 2019, 9(2): 292–308. DOI: 10.1109/JETCAS.2019.2910232

Research on High-Precision Stochastic Computing VLSI Structures for Deep Neural Network Accelerators

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 12

Related Articles 0

Recommended Articles 0

Metrics