电子测量与仪器学报

洪锋,鲁昌华,蒋薇薇,王涛,方恒阳.基于时空一致性约束视频目标车辆的检测与跟踪算法研究[J].电子测量与仪器学报,2022,36(3):105-112

基于时空一致性约束视频目标车辆的检测与跟踪算法研究

Research on vehicle detection& tracking algorithm based onspatio-temporal consistent dual-stream network video target

DOI：

中文关键词: 时空一致性车辆跟踪 Transformer 交叉特征金字塔网络

英文关键词:spatio-temporal consistency vehicle tracking transformer cross-feature pyramid network

基金项目:国家重大科技攻关项目（JZ2015KJZZ0254）、中科院STS重大项目(KFJ STS ZDTP 079)、安徽省优秀拔尖人才培育计划（gxyq2018110，gxyq2019111）、国家基金培育项目(CZ2021GP08)项目资助

作者	单位
洪锋	1.合肥工业大学计算机与信息学院
鲁昌华	1.合肥工业大学计算机与信息学院
蒋薇薇	1.合肥工业大学计算机与信息学院
王涛	1.合肥工业大学计算机与信息学院
方恒阳	1.合肥工业大学计算机与信息学院

Author	Institution
Hong Feng	1.School of Computer and Information, Hefei University of Technology
Lu Changhua	1.School of Computer and Information, Hefei University of Technology
Jiang Weiwei	1.School of Computer and Information, Hefei University of Technology
Wang Tao	1.School of Computer and Information, Hefei University of Technology
Fang Hengyang	1.School of Computer and Information, Hefei University of Technology

摘要点击次数: 1079

全文下载次数: 2557

中文摘要:

复杂场景中的目标感知是深度学习在计算机视觉中最重要的研究领域之一,而复杂交通场景中的车辆检测与跟踪是当今众多学者研究的热点问题。在视频目标检测过程中由于运动物体的时间维度特征信息利用不充分,导致在长序列之间的时间特征极其容易被忽略,本文提出一种时空一致性的视频车辆的检测跟踪算法。该算法由双分支网络结构组成:分支一是由基于空间相关性的 Transformer 网络模块组成,该分支网络主要用于判断前后帧的相关性、感知相邻帧之间的一致性,预测目标车辆时空一致性的关联度;另一网络分支是由基于交叉特征金字塔融合的网络模块组成,该模块主要是提取检测对象的局部信息结合浅层的空间边缘信息和深层的语义特征信息,提取对象空间位置的特征信息。该网络结构将 Transformer 机制和交叉特征金字塔模块相结合,利用 Transformer 对长序列之间时间关联性敏感和特征金字塔网络模块对边缘信息敏感的特性,对视频帧对象进行检测和跟踪,确保相邻帧的长程相关性以及边缘和深层的特征信息深度融合。实验结果表明,本文设计的双分支网络结构在视频目标跟踪和检测中取得更好精度和更快的收敛速度;同时在显著性视频目标检测中,实验表明算法的有效性和泛化性。

英文摘要:

Target perception in complex scenes is one of the most important research fields of deep learning in computer vision, and vehicle detection in complex traffic scenes is the object of research by many scholars today. In the process of video target detection, due to the insufficient utilization of the time dimension feature information of moving objects, time features between long sequences are extremely easy to be ignored. This paper proposes a spatio-temporal consistent video vehicle detection and tracking algorithm. The algorithm is composed of a two-branch network structure: one of branch is composed of transformer network modules based on spatial correlation. The branch network is mainly used to determine the correlation between the previous and subsequent frames, perceive the consistency between adjacent frames, and predict the temporal and spatial consistency of the target vehicle relevance; another network branch is composed of network modules based on cross-feature pyramid fusion. This module mainly extracts the local information of the detected object combined with shallow spatial edge information and high-level semantic feature information. This branch extracts the spatial position of the object characteristic information. The network structure combines the Transformer mechanism and the cross-feature pyramid module, and uses the advantages of Transformer’s sensitivity to the time correlation between long sequences and the feature pyramid network module’s sensitivity to edge information to detect and track video frame objects to ensure neighboring the long-range correlation of the frame is deeply integrated with the feature information of the edge and the deep layer. The experimental results show that the dual-branch network structure designed in this paper achieves better accuracy and faster convergence speed in video target tracking and detection. At the same time, experiments in saliency video target detection show the effectiveness and generalization of the algorithm.

查看全文查看/发表评论下载PDF阅读器