王婧瑶,王红军.基于 Mask R-CNN 与 SG 滤波的手势识别 关键点特征提取方法[J].电子测量与仪器学报,2021,35(9):41-48
基于 Mask R-CNN 与 SG 滤波的手势识别 关键点特征提取方法
Gesture key point extraction method based on Mask R-CNN and SG filter
  
DOI:
中文关键词:  计算机视觉  手势识别  关键点提取  Mask R-CNN  Savitzky-Golay 滤波
英文关键词:computer vision  gesture recognition  key point extraction  Mask R-CNN  Savitzky-Golay filter
基金项目:国家科技部项目(G20190201031)、北京信息科技大学科研内涵发展重点研究培育项目(2020KYNH226)资助
作者单位
王婧瑶 1. 北京工业大学 人工智能与自动化学院,2. 北京信息科技大学 高端装备智能感知与 控制北京市国际科技合作基地 
王红军 2. 北京信息科技大学 高端装备智能感知与控制北京市国际科技合作基地,3. 北京信息科技大学 机电工程学院 
AuthorInstitution
Wang Jingyao 1. College of Artificial Intelligence and Automation, Beijing University of Technology,2. Intelligent Sensing and Control of High-end Equipment Beijing International Science and Technology Cooperation Base,Beijing Information Science and Technology University 
Wang Hongjun 2. Intelligent Sensing and Control of High-end Equipment Beijing International Science and Technology Cooperation Base,Beijing Information Science and Technology University,3. School of Mechanical and Electrical Engineering, Beijing Information Technology University 
摘要点击次数: 691
全文下载次数: 1472
中文摘要:
      手势识别是人机交互的重要手段。 为了精确识别手势并摒除光照等环境干扰,同时减除由于手部高维运动造成的关键 点剧烈抖动的问题,提出一种基于基于蒙版区域的卷积神经网络(Mask Region-based convolutional neural network,Mask R-CNN) 与多项式平滑算法(Savitzky-Golay,SG)的手势关键点提取方法。 该方法首先对输入的红绿蓝(RGB)三通道图像进行特征提取 与区域分割,获得手部的实例分割与掩码。 然后利用 ROIAling 及功能性网络进行目标匹配,标记出 22 个关键点(21 个骨骼点+ 1 个背景点)。 将标记后结果送入 SG 滤波器进行数据平滑,并进行骨骼点的重新标定。 从而得到稳定的手势提取特征。 对模 型进行对比实验,结果表明,该方法能够最大程度摒除环境干扰,并精准提取关键点。 与传统基于轮廓分割的手势关键点提取 相比,模型的鲁棒性大大提高,识别精度达到 93. 48%。
英文摘要:
      Gesture recognition is an important means of human-computer interaction. In order to more accurately recognize gestures and eliminate the interference of environmental conditions such as lighting, and reduce the key point jitter recognition error caused by the high-dimensional space transformation of the hand at the meanwhile, a gesture key point method of extraction based on the Mask R-CNN model and Savitzky-Golay filter is proposed. This method uses the Mask R-CNN model to process RGB three-channel images, and performs object recognition and segmentation on each image, and obtains 21 bone points and background positions of the hand and performs model training. Then uses neural network features to match the video stream and mark 22 key points. Furthermore, the point data is smoothed by using Savitzky-Golay filter and then redraw the data to obtain stable gesture extraction and reconstruction results. This method is used in bone point extraction experiments. Experimental results show that the method can eliminate environmental interference to the greatest extent and accurately extract key points. Compared with traditional gesture key point extraction based on contour segmentation, the accuracy reaches 93. 48%. At the same time, the robustness of the model is greatly improved.
查看全文  查看/发表评论  下载PDF阅读器