电子测量与仪器学报

王婧瑶,王红军.基于 Mask R-CNN 与 SG 滤波的手势识别关键点特征提取方法[J].电子测量与仪器学报,2021,35(9):41-48

基于 Mask R-CNN 与 SG 滤波的手势识别关键点特征提取方法

Gesture key point extraction method based on Mask R-CNN and SG filter

DOI：

中文关键词: 计算机视觉手势识别关键点提取 Mask R-CNN Savitzky-Golay 滤波

英文关键词:computer vision gesture recognition key point extraction Mask R-CNN Savitzky-Golay filter

基金项目:国家科技部项目（G20190201031）、北京信息科技大学科研内涵发展重点研究培育项目（2020KYNH226）资助

作者	单位
王婧瑶	1. 北京工业大学人工智能与自动化学院，2. 北京信息科技大学高端装备智能感知与控制北京市国际科技合作基地
王红军	2. 北京信息科技大学高端装备智能感知与控制北京市国际科技合作基地，3. 北京信息科技大学机电工程学院

Author	Institution
Wang Jingyao	1. College of Artificial Intelligence and Automation, Beijing University of Technology，2. Intelligent Sensing and Control of High-end Equipment Beijing International Science and Technology Cooperation Base,Beijing Information Science and Technology University
Wang Hongjun	2. Intelligent Sensing and Control of High-end Equipment Beijing International Science and Technology Cooperation Base,Beijing Information Science and Technology University，3. School of Mechanical and Electrical Engineering, Beijing Information Technology University

摘要点击次数: 1315

全文下载次数: 3334

中文摘要:

手势识别是人机交互的重要手段。为了精确识别手势并摒除光照等环境干扰,同时减除由于手部高维运动造成的关键点剧烈抖动的问题,提出一种基于基于蒙版区域的卷积神经网络(Mask Region-based convolutional neural network,Mask R-CNN) 与多项式平滑算法(Savitzky-Golay,SG)的手势关键点提取方法。该方法首先对输入的红绿蓝(RGB)三通道图像进行特征提取与区域分割,获得手部的实例分割与掩码。然后利用 ROIAling 及功能性网络进行目标匹配,标记出 22 个关键点(21 个骨骼点+ 1 个背景点)。将标记后结果送入 SG 滤波器进行数据平滑,并进行骨骼点的重新标定。从而得到稳定的手势提取特征。对模型进行对比实验,结果表明,该方法能够最大程度摒除环境干扰,并精准提取关键点。与传统基于轮廓分割的手势关键点提取相比,模型的鲁棒性大大提高,识别精度达到 93. 48%。

英文摘要:

Gesture recognition is an important means of human-computer interaction. In order to more accurately recognize gestures and eliminate the interference of environmental conditions such as lighting, and reduce the key point jitter recognition error caused by the high-dimensional space transformation of the hand at the meanwhile, a gesture key point method of extraction based on the Mask R-CNN model and Savitzky-Golay filter is proposed. This method uses the Mask R-CNN model to process RGB three-channel images, and performs object recognition and segmentation on each image, and obtains 21 bone points and background positions of the hand and performs model training. Then uses neural network features to match the video stream and mark 22 key points. Furthermore, the point data is smoothed by using Savitzky-Golay filter and then redraw the data to obtain stable gesture extraction and reconstruction results. This method is used in bone point extraction experiments. Experimental results show that the method can eliminate environmental interference to the greatest extent and accurately extract key points. Compared with traditional gesture key point extraction based on contour segmentation, the accuracy reaches 93. 48%. At the same time, the robustness of the model is greatly improved.

查看全文查看/发表评论下载PDF阅读器