洪 锋,鲁昌华,蒋薇薇,王 涛,方恒阳.基于时空一致性约束视频目标车辆的
检测与跟踪算法研究[J].电子测量与仪器学报,2022,36(3):105-112 |
基于时空一致性约束视频目标车辆的
检测与跟踪算法研究 |
Research on vehicle detection& tracking algorithm based onspatio-temporal consistent dual-stream network video target |
|
DOI: |
中文关键词: 时空一致性 车辆跟踪 Transformer 交叉特征金字塔网络 |
英文关键词:spatio-temporal consistency vehicle tracking transformer cross-feature pyramid network |
基金项目:国家重大科技攻关项目(JZ2015KJZZ0254)、中科院STS重大项目(KFJ STS ZDTP 079)、安徽省优秀拔尖人才培育计划(gxyq2018110,gxyq2019111)、国家基金培育项目(CZ2021GP08)项目资助 |
|
|
摘要点击次数: 788 |
全文下载次数: 1146 |
中文摘要: |
复杂场景中的目标感知是深度学习在计算机视觉中最重要的研究领域之一,而复杂交通场景中的车辆检测与跟踪是当今
众多学者研究的热点问题。 在视频目标检测过程中由于运动物体的时间维度特征信息利用不充分,导致在长序列之间的时间特
征极其容易被忽略,本文提出一种时空一致性的视频车辆的检测跟踪算法。 该算法由双分支网络结构组成:分支一是由基于空间
相关性的 Transformer 网络模块组成,该分支网络主要用于判断前后帧的相关性、感知相邻帧之间的一致性,预测目标车辆时空一
致性的关联度;另一网络分支是由基于交叉特征金字塔融合的网络模块组成,该模块主要是提取检测对象的局部信息结合浅层的
空间边缘信息和深层的语义特征信息,提取对象空间位置的特征信息。 该网络结构将 Transformer 机制和交叉特征金字塔模块相
结合,利用 Transformer 对长序列之间时间关联性敏感和特征金字塔网络模块对边缘信息敏感的特性,对视频帧对象进行检测和跟
踪,确保相邻帧的长程相关性以及边缘和深层的特征信息深度融合。 实验结果表明,本文设计的双分支网络结构在视频目标跟踪
和检测中取得更好精度和更快的收敛速度;同时在显著性视频目标检测中,实验表明算法的有效性和泛化性。 |
英文摘要: |
Target perception in complex scenes is one of the most important research fields of deep learning in computer vision, and vehicle
detection in complex traffic scenes is the object of research by many scholars today. In the process of video target detection, due to the
insufficient utilization of the time dimension feature information of moving objects, time features between long sequences are extremely easy
to be ignored. This paper proposes a spatio-temporal consistent video vehicle detection and tracking algorithm. The algorithm is composed
of a two-branch network structure: one of branch is composed of transformer network modules based on spatial correlation. The branch
network is mainly used to determine the correlation between the previous and subsequent frames, perceive the consistency between adjacent
frames, and predict the temporal and spatial consistency of the target vehicle relevance; another network branch is composed of network
modules based on cross-feature pyramid fusion. This module mainly extracts the local information of the detected object combined with
shallow spatial edge information and high-level semantic feature information. This branch extracts the spatial position of the object
characteristic information. The network structure combines the Transformer mechanism and the cross-feature pyramid module, and uses the
advantages of Transformer’s sensitivity to the time correlation between long sequences and the feature pyramid network module’s sensitivity
to edge information to detect and track video frame objects to ensure neighboring the long-range correlation of the frame is deeply integrated
with the feature information of the edge and the deep layer. The experimental results show that the dual-branch network structure designed
in this paper achieves better accuracy and faster convergence speed in video target tracking and detection. At the same time, experiments in
saliency video target detection show the effectiveness and generalization of the algorithm. |
查看全文 查看/发表评论 下载PDF阅读器 |
|
|
|