赵挺,曹江涛,姬晓飞.CNN A BLSTM network的双人交互行为识别*[J].电子测量与仪器学报,2021,35(11):100-107
CNN A BLSTM network的双人交互行为识别*
CNN A BLSTM network for two person interaction behavior recognition
  
DOI:
中文关键词:  双人交互行为识别  深度学习  卷积神经网络  双向长短时期记忆网络  注意机制
英文关键词:two person interaction behavior recognition  deep learning  convolutional neural network  bidirectional long short term memory network  attention mechanism
基金项目:国家自然科学基金(61673199)、辽宁省科技公益研究基金(2016002006)项目资助
作者单位
赵挺 辽宁石油化工大学信息与控制工程学院抚顺113001 
曹江涛 辽宁石油化工大学信息与控制工程学院抚顺113001 
姬晓飞 沈阳航空航天大学自动化学院沈阳110136 
AuthorInstitution
Zhao Ting School of Information and Control Engineering, Liaoning Petrochemical University, Fushun 113001, China 
Cao Jiangtao School of Information and Control Engineering, Liaoning Petrochemical University, Fushun 113001, China 
Ji Xiaofei School of Automation, Shenyang Aerospace University, Shenyang 110136, China 
摘要点击次数: 363
全文下载次数: 0
中文摘要:
      关节点数据结合卷积神经网络用于双人交互行为识别存在图像化过程中对交互信息表达不充分且不能有效建模时序关系问题,而结合循环神经网络中存在侧重于对时间信息的表示却忽略了双人交互空间结构信息构建的问题。为此提出一种新的卷积神经网络结合加入注意机制的双向长短时期记忆网络(CNN A BLSTM network)模型。首先对每个人的关节点采用基于遍历树结构进行排列,然后对视频中的每一帧数据构建交互矩阵,矩阵的中的数值为排列后双人之间所有的关节点坐标间的欧氏距离,将矩阵进行灰度图像编码后所得图像依次送入CNN中提取深层次特征得到特征序列,然后将所得序列送入A BLSTM网络中进行时序建模,最后送入Softmax分类器得到识别结果。将新模型用于NTU RGB D数据集中的11类双人交互行为的识别,其准确率为90%,高于目前的双人交互行为识别算法,验证了该模型的有效性和良好的泛化性能。
英文摘要:
      Joint data combined with convolutional neural network for two person interaction behavior recognition has the problem of insufficient expression of interactive information during the imaging process and ineffective modeling of time series relations. In combination with recurrent neural network, there is a problem that focuses on the representation of time information. However, it ignores the problem of constructing information about the spatial structure of the two person interaction. Therefore, a novel model named CNN attention bidirectional long short term memory (CNN A BLSTM) network is proposed. First, the joints of each person are arranged based on the traversal tree structure, and then the interaction matrix is constructed for each frame of data in the video. The values in the matrix are the Euclidean distance between the arranged joint coordinates of two persons. After encoding the gray scale image of the matrix, the images are sequentially sent to CNN to extract deep level features to obtain the feature sequence. And then the obtained feature sequence is sent to the A BLSTM network for time series modeling, and finally sent to the Softmax classifier to obtain the recognition result. The new model is applied to 11 types of two person interaction in NTU RGB D dataset, and the accuracy is 90%, which is higher than the current two person interaction recognition algorithm. The effectiveness and good generalization performance of the new model are verified.
查看全文  查看/发表评论  下载PDF阅读器