Abstract:Aiming at the significant size discrepancies and various shapes among different models of workshop tools, a workshop tool detection method based on sparse learnable proposal is proposed. Firstly, sparse representation and learnable proposal mechanism are integrated to improve the robustness of the model and reduce the required parameters in the detection process. Secondly, Swin-Transformer structure is introduced to enhance the global and detail learning ability of the model, which can effectively overcome the shortcomings of traditional convolution neural network in high-level semantic information fusion. Thirdly, an improved multi-scale feature fusion network architecture is used to improve the detection ability of the model for various scale targets according to effective fusion of different scale features. Finally, multi-head attention and dynamic convolution are combined to establish a more precise and detailed connection between different feature layers, thereby furtherly improving the accuracy of target detection. The CIoU loss function is applied to make the regression prediction of the boundary box more comprehensive and accurate by considering the location, scale and shape information. The experimental results show that the average detection accuracy of the proposed method for workshop tool detection reaches 91%, which is at least 2.3% higher than the current mainstream methods. At the same time, the detection speed of a single picture is about 53 ms, which meets the needs of real-time detection and reflects the excellent comprehensive performance.