电子测量与仪器学报

刘珍兵,孙巧榆,王述文,夏嘉伟.基于稀疏可学习proposal的车间工具目标检测[J].电子测量与仪器学报,2024,38(7):13-21

基于稀疏可学习proposal的车间工具目标检测

Target detection of workshop tools based on sparse learnable proposal

DOI：

中文关键词: 工具检测稀疏可学习多尺度特征 Swin-Transformer 多头注意力

英文关键词:tool detection sparse and learnable multi-scale features Swin-Transformer multi-head attention

基金项目:国家自然科学基金（62271236）项目资助

作者	单位
刘珍兵	江苏海洋大学电子工程学院连云港222005
孙巧榆	江苏海洋大学电子工程学院连云港222005
王述文	江苏海洋大学电子工程学院连云港222005
夏嘉伟	江苏海洋大学电子工程学院连云港222005

Author	Institution
Liu Zhenbing	School of Electronic Engineering, Jiangsu Ocean University, Lianyungang 222005, China
Sun Qiaoyu	School of Electronic Engineering, Jiangsu Ocean University, Lianyungang 222005, China
Wang Shuwen	School of Electronic Engineering, Jiangsu Ocean University, Lianyungang 222005, China
Xia Jiawei	School of Electronic Engineering, Jiangsu Ocean University, Lianyungang 222005, China

摘要点击次数: 443

全文下载次数: 1763

中文摘要:

针对车间工具不同型号之间尺寸存在较大差异、形状种类繁多等问题，提出了一种基于稀疏可学习proposal的车间工具检测算法。首先，融入稀疏表示和可学习的proposal机制来提升模型的鲁棒性，并减少检测过程中所需的参数量；其次，引入Swin-Transformer结构，旨在增强模型的全局以及细节学习能力，有效地解决传统卷积神经网络在高层语义信息融合方面存在的不足；然后，使用一种改进的多尺度特征融合网络架构，通过有效融合不同尺度的特征，提高了模型对于各种尺度目标的检测能力；最后，将多头注意力和动态卷积结合，在不同特征层之间建立更精确且细致的联系，从而进一步提升了目标检测的准确性；采用了CIoU损失函数，通过综合考虑位置、尺度和形状信息，使得模型对边界框的回归预测更加全面与准确。实验结果显示，本文算法在车间工具目标检测任务上的平均检测精度达到了91%，较当前主流算法至少提升了2.3%以上。同时，单张图片的检测速度大约为53 ms，满足了实时检测的需求，体现了综合性能优越。

英文摘要:

Aiming at the significant size discrepancies and various shapes among different models of workshop tools, a workshop tool detection method based on sparse learnable proposal is proposed. Firstly, sparse representation and learnable proposal mechanism are integrated to improve the robustness of the model and reduce the required parameters in the detection process. Secondly, Swin-Transformer structure is introduced to enhance the global and detail learning ability of the model, which can effectively overcome the shortcomings of traditional convolution neural network in high-level semantic information fusion. Thirdly, an improved multi-scale feature fusion network architecture is used to improve the detection ability of the model for various scale targets according to effective fusion of different scale features. Finally, multi-head attention and dynamic convolution are combined to establish a more precise and detailed connection between different feature layers, thereby furtherly improving the accuracy of target detection. The CIoU loss function is applied to make the regression prediction of the boundary box more comprehensive and accurate by considering the location, scale and shape information. The experimental results show that the average detection accuracy of the proposed method for workshop tool detection reaches 91%, which is at least 2.3% higher than the current mainstream methods. At the same time, the detection speed of a single picture is about 53 ms, which meets the needs of real-time detection and reflects the excellent comprehensive performance.

查看全文查看/发表评论下载PDF阅读器