Abstract:To address the issues of detail information loss and blurred salient target contours in existing infrared and visible image fusion models during deep feature extraction, we propose an infrared and visible image fusion method that combines semantic segmentation with cross-modality differential feature compensation (CMDFC). By incorporating a crossmodality differential feature compensation module with a convolutional block attention module (CBAM), complementary features from different modalities are integrated into the original features for deep feature extraction. Additionally, a semantic segmentation network is introduced to perform pixel-level classification on the fused image, constructing a semantic loss to constrain the fusion network, and a decoder is used to reconstruct the fused image. Experimental results on public datasets show that compared to the best metrics of the reference models, the proposed model achieves various degrees of improvement in five selected metrics, with mutual information (MI) and visual information fidelity (VIF) increased of 4.41% and 4.25%, respectively. These results indicate that the proposed model generates clearer fused images with stronger correlation to the source images, effectively mitigating the issue of feature detail loss during the fusion process and enhancing the visual quality and contrast of the generated images.