• Chỉ mục bởi
  • Năm xuất bản
LIÊN KẾT WEBSITE

Acceleration of Deep Neural Network Training Using Field Programmable Gate Arrays

Tufa Faculty of Electrical and Computer Engineering, Arba Minch Institute of Technology, Arba Minch, Ethiopia|
Anchit (56905383800) | Guta Tesema (57948167600); Andargie School of Electrical and Computer Engineering, Addis Ababa Institute of Technology, Ethiopia|

Computational Intelligence and Neuroscience Số , năm 2022 (Tập 2022, trang -)

ISSN: 16875265

ISSN: 16875265

DOI:

Tài liệu thuộc danh mục:

Article

English

Từ khóa: Acceleration; Algorithms; Neural Networks, Computer; Acceleration; Convolutional neural networks; Deep neural networks; Energy efficiency; Gradient methods; Large dataset; System-on-chip; Computational resources; Convolutional neural network; Energy efficient; Field programmables; Network inference; Network training; Neural networks trainings; Performance; Programmable gate array; Speed up; acceleration; algorithm; Field programmable gate arrays (FPGA)
Tóm tắt tiếng anh
Convolutional neural network (CNN) training often necessitates a considerable amount of computational resources. In recent years, several studies have proposed for CNN inference and training accelerators in which the FPGAs have previously demonstrated good performance and energy efficiency. To speed up the processing, CNN requires additional computational resources such as memory bandwidth, a FPGA platform resource usage, time, power consumption, and large datasets for training. They are constrained by the requirement for improved hardware acceleration to support scalability beyond existing data and model sizes. This paper proposes a procedure for energy efficient CNN training in collaboration with an FPGA-based accelerator. We employed optimizations such as quantization, which is a common model compression technique, to speed up the CNN training process. Additionally, a gradient accumulation buffer is used to ensure maximum operating efficiency while maintaining gradient descent of the learning algorithm. To validate the design, we implemented the AlexNet and VGG-16 models on an FPGA board and laptop CPU along side GPU. It achieves 203.75 GOPS on Terasic DE1 SoC with the AlexNet model and 196.50 GOPS with the VGG-16 model on Terasic DE-SoC. Our result also exhibits that the FPGA accelerators are more energy efficient than other platforms. � 2022 Guta Tesema Tufa et al.

Xem chi tiết