YOLOv3-tiny 的 FPGA 加速设计与实现

DESIGN AND IMPLEMENTATION OF YOLOv3-TINY ACCELERATES FPGA

  • 摘要: 针对 YOLOv3-tiny 的网络结构,提出一种基于结构化压缩方案的硬件加速架构。通过稀疏化训练和通道剪枝来降低网络的计算量,相较于原网络压缩了 48%。采用定点量化在保障网络精度的前提下加快运算速度,采用循环分块与通道交错传输减少片内存储加快数据传输,设计多通道并行加速网络计算。设计卷积、池化、上采样等各个计算模块提高计算效率,整个系统可以在 150MHz 的时钟频率下稳定运行。实验表明,以三通道 416×416 图像为入口,在 Xilinx Zynq UltraScale+ MPSoC 平台上可以达到 7.04 帧每秒的前向推理速度,获得了 28.03GOP/s 的运算能力,功耗为 2.91W。

     

    Abstract: Aimed at the network structure of YOLOv3-tiny, a hardware acceleration architecture based on structured compression scheme is proposed. The amount of computation of the network was reduced by sparse training and channel pruning, which was 48% compressed compared with the original network. Fixed-point quantization was used to speed up the operation speed under the premise of ensuring network accuracy, cyclic block and channel interleaved transmission were used to reduce on-chip storage to accelerate data transmission, and multi-channel parallel acceleration network computing was designed. The convolution, pooling, up-sampling and other calculation modules were designed to improve the calculation efficiency, the whole system could operate stably at a clock frequency of 150 MHz. Experiments show that with the three-channel 416×416 images as the entrance, the forward inference speed of 7.04 FPS can be reached on the Xilinx Zynq UltraScale+ MPSoC platform, and the computing power of 28.03 GOP/s is obtained, and the power consumption is 2.91 W.

     

/

返回文章
返回