Abstract:
For the semantic segmentation network, the following problems exist in the fusion of low-level and high-level feature in the encoder-decoder: (1) feature extraction in space and channel cannot be synchronized, resulting in feature combinations that cannot obtain global context information; (2) feature fusion cannot be fully utilized low-level and high-level feature images, resulting in blurred semantic boundaries. The global atrous spatial pyramid pooling was designed. This structure not only extracted multi-scale information in space and utilized image information in channels, but also enhanced feature reuse in the encoder stage. A feature fusion attention module was designed to connect low-level and high-level features and new features at different stages in the encoder. Experiments show that the algorithm achieves 77.92% mIoU on the Cityscapes dataset.