DESIGN AND IMPLEMENTATION OF YOLOv3-TINY ACCELERATES FPGA
-
Abstract
Aimed at the network structure of YOLOv3-tiny, a hardware acceleration architecture based on structured compression scheme is proposed. The amount of computation of the network was reduced by sparse training and channel pruning, which was 48% compressed compared with the original network. Fixed-point quantization was used to speed up the operation speed under the premise of ensuring network accuracy, cyclic block and channel interleaved transmission were used to reduce on-chip storage to accelerate data transmission, and multi-channel parallel acceleration network computing was designed. The convolution, pooling, up-sampling and other calculation modules were designed to improve the calculation efficiency, the whole system could operate stably at a clock frequency of 150 MHz. Experiments show that with the three-channel 416×416 images as the entrance, the forward inference speed of 7.04 FPS can be reached on the Xilinx Zynq UltraScale+ MPSoC platform, and the computing power of 28.03 GOP/s is obtained, and the power consumption is 2.91 W.
-
-