电子测量与仪器学报

郑自立,徐健,刘秀平,刘高峰,赵一剑,夏代洪.联合多注意力和 C-ASPP 的单目 3D 目标检测[J].电子测量与仪器学报,2023,37(8):241-248

联合多注意力和 C-ASPP 的单目 3D 目标检测

Combined multi-attention and C-ASPP network for monocular 3D object detection

DOI：

英文关键词:monocular 3D target detection depth estimation multi-attention mechanism machine vision autonomous driving

基金项目:陕西省科技厅项目(2018GY-173)、西安市科技局项目(GXYD7. 5)资助

作者	单位
郑自立	1.西安工程大学电子信息学院
徐健	1.西安工程大学电子信息学院
刘秀平	1.西安工程大学电子信息学院
刘高峰	1.西安工程大学电子信息学院
赵一剑	1.西安工程大学电子信息学院
夏代洪	1.西安工程大学电子信息学院

Author	Institution
Zheng Zili	1.School of Electronics and Information, Xi′an Polytechnic University
Xu Jian	1.School of Electronics and Information, Xi′an Polytechnic University
Liu Xiuping	1.School of Electronics and Information, Xi′an Polytechnic University
Liu Gaofeng	1.School of Electronics and Information, Xi′an Polytechnic University
Zhao Yijian	1.School of Electronics and Information, Xi′an Polytechnic University
Xia Daihong	1.School of Electronics and Information, Xi′an Polytechnic University

摘要点击次数: 1257

全文下载次数: 2208

中文摘要:

针对单目 3D 检测中网络结构复杂、深度估计后得到的目标深度信息不精确的问题,本文提出一种端到端的联合多注意力深度估计的单目 3D 目标检测网络结构(CDCN-3D)。首先,为获取目标显著特征,引入自适应空间注意力机制,对像素特征进行聚集,以增强局部特征来提升网络表征能力;其次,为改善深度估计时局部信息丢失问题,利用改进 C-ASPP 使每个深度信息都能够捕获更加精确的方向感知和位置敏感信息;最后,利用精确的 P-BEV 将得到的目标三维信息映射到二维平面,再用单级目标检测器完成检测输出任务。实验结果证明,CDCN-3D 网络在 KITTI 数据集上,在 FPS 与现有单目 3D 检测网络持平情况下,其准确率优于其他网络,在 Car、Pedestrian、Cyclist 类中,其检测精确度分别提升 2. 31%、1. 48%、1. 14%,能够完成 3D 目标检测任务。

英文摘要:

In monocular 3D detection, the complex network structure and inaccurate target depth information obtained after depth estimation are two problems that need to be dealt with. To address this issue, we propose an end-to-end joint multi-attention depth estimation monocular 3D target detection network structure, named CDCN-3D. First of all, to obtain the salient features of the target, we introduce an adaptive spatial attention mechanism to aggregate the pixel features, which enhances local features and improves the network representation ability. Second, we use an improved C-ASPP approach to address the problem of local information loss in depth estimation, capturing more accurate direction perception and position-sensitive information for each depth information. Finally, the accurate P-BEV is used to map the three-dimensional information of the target to a two-dimensional plane, and then the single-stage target detector is used to complete the detection and output task. Through experiments on the KITTI dataset, the proposed CDCN-3D network shows improved accuracy compared to other networks, with the same FPS as that of the existing monocular 3D detection network. More specifically, and the detection accuracy of the CDCN-3D network is improved by 2. 31%, 1. 48%, 1. 14% respectively by the class of Car、Pedestrian、Cyclist, which can complete the 3D target detection task.

查看全文查看/发表评论下载PDF阅读器