Abstract:In response to the technical challenges posed by the varying sizes, haphazard arrangement, and overlapping occlusion of prohibited items in security X-ray images, we propose an enhanced HRNet-based multi-scale feature fusion network model. This model aims to achieve automatic segmentation and recognition of prohibited items in images. In the encoding stage, we leverage the multi-resolution parallel network architecture of HRNet to extract multi-scale features, addressing the diverse scale of prohibited items in security X-ray images. In the decoding stage, a multi-level feature aggregation module is introduced that uses data-dependent upsampling instead of bilinear interpolation. upsampling to reduce information loss during aggregation, thus ensuring a more comprehensive representation of the features of the features extracted in the coding stage for a more complete characterisation of objects. In the overall architecture of the network, a de-obscuration module based on the attention mechanism is embedded to strengthen the edge-awareness ability of the model, alleviate the problem of serious overlapping occlusion of items in security X-ray images, and improve the segmentation and recognition accuracy of the model. By experimenting on the public dataset of PIDray security check images, the results show that the average intersection ratio of 73.15%, 69.47%, and 58.33% are achieved in the three validation subsets of Easy, Hard, and Hidden, respectively, which are 0.49%, 1.17%, and 5.69%, respectively, and the overall average intersection ratio is improved by about 2.45%.