深度嵌套注意力下的SlowFast信息融合动作识别网络
DOI:
作者:
作者单位:

沈阳工业大学信息科学与工程学院沈阳110870

作者简介:

通讯作者:

中图分类号:

TP391;TN98

基金项目:

国家自然科学基金(62173078)、辽宁省自然科学基金(2022-MS-268)项目资助


SlowFast information fusion action recognition network based on deeply nested attention mechanism
Author:
Affiliation:

School of Information Science and Engineering, Shenyang University of Technology, Shenyang 110870, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    视频动作识别在视频监控、自动驾驶等多个领域都有着广泛的应用。SlowFast网络是视频动作识别领域经常使用的网络。目前SlowFast相关网络中使用注意力进行相关信息增强,注意力机制与网络的结合方式是将注意力机制嵌套到网络的各个卷积块之间,如果将注意力机制深层嵌套到卷积块的具体卷积层中,SlowFast网络的信息提取能力将更进一步。首先提出了一种深度嵌套注意力机制,该深度嵌套机制内部包含一种可以提取时空与通道信息的注意力SCTM,使SlowFast网络的3种信息提取能力得到了进一步加强。此外,目前多流网络融合的信息并没有充分的交互与处理。提出了一种基于交叉注意力与ConvLSTM的多流时空信息融合网络,使多流网络中每个流的信息充分交互。改进后的SlowFast网络在UCF101数据集上的Top-1准确率已达到98.5%,在HMDB51数据集中的准确率达到了80.1%。均优于目前已有的模型,比原始SlowFast网络提高了2.64%,且鉴于上述数据,深度嵌套注意力的 SlowFast 时空信息融合网络在信息提取与融合方面具有优越性能。

    Abstract:

    Video action recognition has been widely used in many fields such as video surveillance and automatic driving. SlowFast network is often used in the field of video action recognition. At present, attention is used to enhance relevant information in SlowFast correlation network. The combination of attention mechanism and network is to embed the attention mechanism among various convolutional blocks of the network. If the attention mechanism is deeply embedded into the specific convolutional layer of the convolutional block, the information extraction capability of the SlowFast network will be further enhanced. Firstly, a deep nested attention mechanism is proposed, which contains an attention SCTM that can extract space-time and channel information, and further strengthens the three information extraction capabilities of SlowFast network. In addition, the current multi-stream network fusion information is not fully interactive and processed. A multi-stream spatio-temporal information fusion network based on cross-attention and ConvLSTM is proposed to make the information of each stream in the multi-stream network fully interact. The improved SlowFast network has achieved 98.5% Top-1 accuracy on UCF101 and 80.1% accuracy on HMDB51. Compared with the original SlowFast network, the SlowFast spatiotemporal information fusion network with deeply nested attention has superior performance in information extraction and fusion.

    参考文献
    相似文献
    引证文献
引用本文

张起尧,桑海峰.深度嵌套注意力下的SlowFast信息融合动作识别网络[J].电子测量与仪器学报,2024,38(3):159-166

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-05-23
  • 出版日期: