Abstract:
Self-supervised contrastive learning (SCL) has emerged as a key technique for unsupervised fine-grained feature extraction in deep learning. However, SCL often suffers from performance degradation in fine-grained scenarios, primarily due to its excessive reliance on global representations and the insufficient modeling of key regions as well as the underutilization of multi-scale detailed information. To address these limitations, this paper proposes a novel fine-grained contrastive learning method consisting of two core modules. A region-guided centroid regression module adaptively guided the segmentation and extraction of regions of interest, which reduced the problem of information loss in traditional ROI methods. A lightweight multi-scale extraction module enhanced both the efficiency and accuracy of feature representation by integrating multi-scale receptive fields. Experimental results on multiple real-world datasets demonstrate that the proposed method achieves significant improvements in tasks such as biometric recognition and image classification. Compared with mainstream SCL approaches, the proposed method reduces the equal error rate (EER) by 52.0% relative to SimCLR and 66.7% relative to BYOL, which strongly validates its effectiveness and superiority.