COMBINING FEATURE FUSION AND MIXED ATTENTION FOR FINE-GRAINED IMAGE CLASSIFICATION
-
Abstract
In order to fully extract local key features in fine-grained images, a fine-grained image classification algorithm combining feature fusion and hybrid attention is proposed. We used SE (Squeeze-and-Excitation Networks) to introduce channel attention to improve feature extraction capabilities. We proposed feature fusion to fully fuse low-level and high-level semantic information after cross-channel interaction. We improved selective sparse sampling (S3N) method, and introduced spatial attention to obtain salient sampling maps. A two-branch classification model that could be trained end-to-end was constructed to improve the classification accuracy by cross-validation. The classification accuracies of 87.84%, 93.59% and 94.25% are achieved on the CUB-200-2011, FGVC-Aircraft and Stanford Cars datasets, respectively, outperforming the backbone network and current mainstream algorithms.
-
-