刘丹,马同伟.结合语义信息的行人检测方法[J].电子测量与仪器学报,2019,33(1):54-60
结合语义信息的行人检测方法
Pedestrian detection method based on semantic information
  
DOI:
中文关键词:  自动驾驶  车辆检测  卷积神经网络
英文关键词:pedestrian detection  semantic segmentation  convolutional neural network
基金项目:河南省高等学校重点科研项目(19B520005)资助
作者单位
刘丹 1.河南工学院计算机科学与技术系 
马同伟 1.河南工学院计算机科学与技术系 
AuthorInstitution
Liu Dan 1.Department of Computer Science & Technology, Henan Institute of Technology 
Ma Tongwei 1.Department of Computer Science & Technology, Henan Institute of Technology 
摘要点击次数: 423
全文下载次数: 0
中文摘要:
      随着卷积神经网络(CNN)的提出,行人检测方法的正确率已经得到了很大提升,尽管CNN模型可以学习到目标的不同变化,然而自动驾驶场景下的行人检测依然面临着巨大挑战,主要表现为广泛的尺度变化、光照变化以及不同程度的遮挡。在已有CNN网络的基础上,提出一种更为鲁棒的行人检测方法,其主要思想是在原有检测器的基础上利用像素级的语义信息作为额外的监督来训练CNN。该算法首先提取CNN不同尺度的特征图,在这些特征图上铺设不同大小的目标候选框,添加一层卷积层对这些目标候选框进行分类和回归,同时利用这些特征图生成语义分割图,最终分为两路分别监督目标检测结果和语义分割结果。在最新的行人检测数据集CityPersons上的结果表明,结合语义信息可以提升算法的检测成功率,并且不增加算法耗时,在数据集中1 024×2 048 pixels的图像上平均检测耗时仅为03 s一张图像。
英文摘要:
      Along with the emergence of convolutional neural networks (CNN), pedestrian detection has been largely improved. Although CNN models can learn different variations of objects, pedestrian detection in autonomous driving still faces various challenges, which mainly include large scale variation, illumination variation and occlusion of different levels. In this paper, based on the previous CNN models, a robust pedestrian detection method is proposed. The main idea lies in combining the semantic information into the original detection framework for further supervision. It firstly extracts feature maps of different scales in CNN, based on the paved anchor boxes with various scales, and an additional convolutional layer is appended to be responsible for classification and regression. Meantime, semantic segmentation maps are generated from these feature maps. Finally, two streams are utilized to supervise detection and segmentation. Experiments on the recent CityPersons pedestrian detection dataset show that the semantic segmentation can significantly improve the detection accuracy without taking extra time, and the processing time is only 03 second per image in 1 280×384 pixels images in the dataset.
查看全文  查看/发表评论  下载PDF阅读器