Abstract:Addressing the limitation of traditional gait recognition methods that neglect temporal information in gait features, we propose a gait recognition framework that integrates 3D-CBAM and cross-temporal scale feature analysis. By incorporating an attention module into the model, it adaptively focuses on critical channels and spatial locations within the input gait sequences, enhancing the model's gait recognition performance. Furthermore, the Enhanced Global and Local Feature Extractor (EGLFE) decouples temporal and spatial information to a certain extent during global feature extraction. By inserting additional LeakyReLU layers between 2D and 1D convolutions, the number of nonlinearities in the network is increased, which aids in expanding the receptive field during gait feature extraction. This, in turn, boosts the model's ability to learn features, achieving better global feature extraction results. Local features are also integrated to compensate for feature loss due to partitioning. A multi-scale temporal enhancement module fuses frame-level features and short-to-long-term temporal features, enhancing the model's robustness against occlusion. We conducted training and testing on the CASIA-B and OU-MVLP datasets. On the CASIA-B dataset, the average recognition accuracy reached 92.7%, with rank-1 accuracies of 98.1%, 95.1%, and 84.9% for Normal (NM), Bag (BG), and Clothing (CL) conditions, respectively. Experimental results demonstrate that the proposed method exhibits excellent performance under both normal walking and complex conditions.