曲熠,陈莹.基于稳定光度损失的无监督单目深度估计[J].电子测量与仪器学报,2024,38(11):158-167
基于稳定光度损失的无监督单目深度估计
Unsupervised monocular depth estimation based on stable photometric loss
  
DOI:
中文关键词:  单目深度估计  无监督学习  深度学习  光度损失  鲁棒性
英文关键词:monocular depth estimation  unsupervised learning  deep learning  photometric loss  robustness
基金项目:国家自然科学基金(62173160)资助项目
作者单位
曲熠 江南大学轻工过程先进控制教育部重点实验室无锡214122 
陈莹 江南大学轻工过程先进控制教育部重点实验室无锡214122 
AuthorInstitution
Qu Yi Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), Jiangnan University, Wuxi 214122, China 
Chen Ying Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), Jiangnan University, Wuxi 214122, China 
摘要点击次数: 433
全文下载次数: 1263
中文摘要:
      在基于视频的无监督单目深度估计模型训练中,光度损失一直发挥着重要作用,但其在弱纹理区域和边缘区域等特殊区域普遍存在较大误差,导致训练网络的监督信号存在较强的不稳定性。针对这一问题,提出一种更具鲁棒性的无监督单目深度估计方法。本文方法首先结合双分支编码器和通道注意力模块来提升单帧深度网络对深度特征的提取能力,然后利用单帧深度网络结果引导进行多帧深度估计,以提高深度估计的准确性。在此基础上设计一种新型光度损失函数,通过计算图像梯度上的光度损失消除局部亮度变化引起的不合理监督,并利用连续像素之间的差异特性来定义模糊像素,最后基于二进制掩模排除由于目标帧和重构目标帧上边缘模糊像素产生的错误监督。本文方法在KITTI数据集的测试结果中,平均相对误差、平方相对误差、均方根误差等多项指标均有提升,平均相对误差和平方相对误差分别降低至0.075和0.548。实验结果证明,与其他先进方法相比,本文方法进一步提高了现有模型的性能。
英文摘要:
      The photometric loss has been playing an important role in the training of video-based unsupervised monocular depth estimation models. However, it generally has large errors in special regions such as weak texture regions and edge regions, which leads to strong instability in the supervision signal of the training network. To solve the problem, a more robust unsupervised monocular depth estimation method is proposed. The method first combines the dual-branch encoder and the channel attention module to improve the extraction ability of the single-frame depth network for depth features. Then, the single-frame depth network results are used to guide the multi-frame depth estimation to improve the accuracy of depth estimation. On the basis, a new photometric loss function is designed. By calculating the photometric loss on the image gradient, the unreasonable supervision caused by local brightness changes is eliminated. At the same time, the difference between successive pixels is used to define the blurry pixels. Finally, the false supervision caused by the blurred pixels on the target frame and the reconstructed target frame is excluded based on the binary mask. In the test results of the KITTI dataset, multiple indicators such as the average relative error, the square relative error and the root mean square error have improved. The average relative error and the squared relative error are reduced to 0.075 and 0.548 respectively. The experimental result shows that the proposed method further improves the performance of existing models compared with other advanced methods.
查看全文  查看/发表评论  下载PDF阅读器