Abstract:Addressing the low accuracy problem in detecting multi-scale, multi-type steel surface defects within complex backgrounds, this paper designs an improved YOLOv5 algorithm that integrates HGnetv2 with an attention mechanism. First, the HGnetv2 network incorporates an attention mechanism as a backbone layer to enhance feature extraction capabilities for small target defects. Second, in the feature fusion layer, attention mechanisms and involution operations are combined to achieve effective aggregation of edge features in shallow layers and semantic information in deep layers. Besides, CBME_C2f replaces the C3_Bottleneck module to improve gradient flow. Additionally, a new bounding box loss function, VCIoU, is used to calculate positional features between the vertices and center points of the prediction and target boxes, enhancing bounding box regression precision. Finally, MetaAconC is introduced to adaptively adjust the non-linearity of activation for each feature map channel, improving the ability to extract feature information from complex backgrounds. Experimental results on the NEU-DET dataset show that the proposed method achieves an mAP50 of 81.4% and an mAP@50:95 of 44.1%, which is 5.4% and 2.8% better than YOLOv5s respectively. For the small defects such as crazing in this dataset, the detection accuracy reaches 55.4%, representing an 18.1% improvement over YOLOv5s, while maintaining a detection speed of 80.6 fps. Compared to other mainstream defect detection algorithms, this algorithm improves accuracy while meeting the real-time demands of steel surface defect detection.