Abstract:
At present, defect detection algorithms based on traditional methods are slow and have low detection accuracy. However, the defect detection technology based on deep learning is also unable to be deployed and used on limited devices due to its huge calculation volume and model size. In response of the above problems, the Vision Transformer (ViT) structure was introduced to the feature extraction part and an algorithm that changed the convolution structure (GhostNet) was used to reduce the weight of the defect detection network structure, and the network model Yolov5s-Ghost-ViT (YoloGT) was obtained. Compared with the original model, the YoloGT model volume, calculation amount and parameter amount were reduced by 42.4%, 47.9%, and 38.8%, respectively, and the mAP value has increased by 1.65 percentage points and 2.9 percentage points on the VOC and NEU data sets, respectively. Compared with the original algorithm, the proposed algorithm is more suitable for the embedded real-time detection system of steel plate surface defects in industrial scenes.