Abstract:Wi-Fi wireless sensing technology has become a research hotspot in the field of perception, which can realize intelligent perception of human activities and the surrounding environment. The existing wireless sensing models have a large number of parameters, which makes it difficult to sense in real-time in scenarios with limited computing power such as mobile devices. To this end, a classification and recognition model based on a mixture of a lightweight feature extraction module based on depth-separable convolution and a stacked gated recurrent unit is proposed. Firstly, a lightweight feature extraction module based on depth-separable convolution is constructed to capture the spatial features of human gestures and keep the temporal nature of the features unchanged; then the spatio-temporal features of human gestures are learned using a two-layer stacked GRU network; finally, the performance of the model is validated using the open-source dataset Widar, and the BVP features in the CSI information are extracted to improve the recognition of cross-domain scenes accuracy, and a weighted loss function is utilized to solve the sample imbalance problem. The results show that the proposed model achieves an accuracy of 77.6% in cross-domain scenarios with a parameter count of only 236.891 K. Compared with other existing Wi-Fi gesture recognition models, the proposed model greatly reduces the parameters and computational complexity of the model while its performance remains basically unchanged, which lays a foundation for the popularization of the Wi-Fi wireless sensing technology in practical applications.