樊 博,高玮玮,单明陶,方 宇.融合注意力机制与重影特征映射的无人机交通场景目标轻量级语义分割[J].电子测量与仪器学报,2023,37(3):21-28
融合注意力机制与重影特征映射的无人机交通场景目标轻量级语义分割
Lightweight semantic segmentation of UAV traffic scene objects combining attention mechanism and ghost feature mapping
  
DOI:
中文关键词:  语义分割  无人机交通场景  轻量化  注意力机制  重影特征映射  损失函数
英文关键词:semantic segmentation  UAV traffic scenes  lightweight  attention mechanism  ghost feature mapping  loss function
基金项目:国家自然科学基金(61703268)项目资助
作者单位
樊 博 1.上海工程技术大学机械与汽车工程学院 
高玮玮 1.上海工程技术大学机械与汽车工程学院 
单明陶 1.上海工程技术大学机械与汽车工程学院 
方 宇 1.上海工程技术大学机械与汽车工程学院 
AuthorInstitution
Fan Bo 1.School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science 
Gao Weiwei 1.School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science 
Shan Mingtao 1.School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science 
Fang Yu 1.School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science 
摘要点击次数: 555
全文下载次数: 752
中文摘要:
      针对轻量语义分割算法应用于无人机高分辨率交通场景图像分割时存在边缘信息模糊、小目标特征提取准确性较差的 问题,提出一种融合注意力机制与重影特征映射的轻量级语义分割算法。 首先在 BiSeNet V2 算法语义分支 8 倍和 16 倍下采样 过程嵌入混合注意力模块,重新分配深层特征图权重,增强局部关键特征提取能力;然后采用重影特征映射单元优化传统卷积 层,进一步降低运算成本;最后使用动态阈值损失函数监督训练,调节高损失困难样本训练权重。 利用 UAVid 数据集对改进后 的算法进行训练并测试,发现算法平均交并比(mean intersection over union,mIoU)为 52. 7%,较改进前的模型提升 7. 8%,且当输 入图像尺寸为 1 280×736 时推理速度达到 81. 6 FPS,满足实时分割要求。 结果表明,该算法能较好适应复杂交通场景,有效改 善边缘信息模糊和小目标分割准确性较差的问题。
英文摘要:
      To solve the problems of blurred edge information and poor accuracy of small targets feature extraction when the lightweight semantic segmentation algorithm is applied to the UAV high-resolution traffic scenes image segmentation, a lightweight semantic segmentation algorithm combining attention mechanism and ghost feature mapping is proposed. Firstly, the hybrid attention module is embedded in the semantic branch 8-fold and 16-fold down-sampling process of the BiSeNet V2 to redistribute the weights of the deep feature maps and enhance the local key feature extraction ability. Then the ghost feature mapping unit is used to optimize the traditional convolution layers to further reduce the computational cost. Finally, the dynamic threshold loss function is applied to supervise the training, adjusting the training weights of the high-loss difficult samples. Using the UAVid dataset to train and test the improved algorithm, it is found that the mIoU is 52. 7%, which is 7. 8% higher than the BiSeNet V2. When the input images size is 1 280×736, the inference speed can reach 73. 6 FPS, meeting the real-time segmentation requirements. The results show that the algorithm can be well adapted to complex traffic scenes, and can effectively improve the problems of blurred edge information and poor accuracy of small objects.
查看全文  查看/发表评论  下载PDF阅读器