电子测量与仪器学报

樊博,高玮玮,单明陶,方宇.融合注意力机制与重影特征映射的无人机交通场景目标轻量级语义分割[J].电子测量与仪器学报,2023,37(3):21-28

融合注意力机制与重影特征映射的无人机交通场景目标轻量级语义分割

Lightweight semantic segmentation of UAV traffic scene objects combining attention mechanism and ghost feature mapping

DOI：

中文关键词: 语义分割无人机交通场景轻量化注意力机制重影特征映射损失函数

英文关键词:semantic segmentation UAV traffic scenes lightweight attention mechanism ghost feature mapping loss function

基金项目:国家自然科学基金(61703268)项目资助

作者	单位
樊博	1.上海工程技术大学机械与汽车工程学院
高玮玮	1.上海工程技术大学机械与汽车工程学院
单明陶	1.上海工程技术大学机械与汽车工程学院
方宇	1.上海工程技术大学机械与汽车工程学院

Author	Institution
Fan Bo	1.School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science
Gao Weiwei	1.School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science
Shan Mingtao	1.School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science
Fang Yu	1.School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science

摘要点击次数: 1186

全文下载次数: 3106

中文摘要:

针对轻量语义分割算法应用于无人机高分辨率交通场景图像分割时存在边缘信息模糊、小目标特征提取准确性较差的问题,提出一种融合注意力机制与重影特征映射的轻量级语义分割算法。首先在 BiSeNet V2 算法语义分支 8 倍和 16 倍下采样过程嵌入混合注意力模块,重新分配深层特征图权重,增强局部关键特征提取能力;然后采用重影特征映射单元优化传统卷积层,进一步降低运算成本;最后使用动态阈值损失函数监督训练,调节高损失困难样本训练权重。利用 UAVid 数据集对改进后的算法进行训练并测试,发现算法平均交并比(mean intersection over union,mIoU)为 52. 7%,较改进前的模型提升 7. 8%,且当输入图像尺寸为 1 280×736 时推理速度达到 81. 6 FPS,满足实时分割要求。结果表明,该算法能较好适应复杂交通场景,有效改善边缘信息模糊和小目标分割准确性较差的问题。

英文摘要:

To solve the problems of blurred edge information and poor accuracy of small targets feature extraction when the lightweight semantic segmentation algorithm is applied to the UAV high-resolution traffic scenes image segmentation, a lightweight semantic segmentation algorithm combining attention mechanism and ghost feature mapping is proposed. Firstly, the hybrid attention module is embedded in the semantic branch 8-fold and 16-fold down-sampling process of the BiSeNet V2 to redistribute the weights of the deep feature maps and enhance the local key feature extraction ability. Then the ghost feature mapping unit is used to optimize the traditional convolution layers to further reduce the computational cost. Finally, the dynamic threshold loss function is applied to supervise the training, adjusting the training weights of the high-loss difficult samples. Using the UAVid dataset to train and test the improved algorithm, it is found that the mIoU is 52. 7%, which is 7. 8% higher than the BiSeNet V2. When the input images size is 1 280×736, the inference speed can reach 73. 6 FPS, meeting the real-time segmentation requirements. The results show that the algorithm can be well adapted to complex traffic scenes, and can effectively improve the problems of blurred edge information and poor accuracy of small objects.

查看全文查看/发表评论下载PDF阅读器