Abstract:A visual navigation model incorporating split attention mechanism and next expected observation ( NEO) is proposed to address the problem that deep reinforcement learning visual navigation algorithm degrades navigation accuracy, real-time and reliability of image matching due to navigation scene changes. The features of current and target states are first extracted using the ResNest50 backbone network to reduce network redundancy. The shallow target feature information is captured intensively using a cross-stagepartial-connections CSP to enhance the learning ability of the model. Then an improved loss function is proposed to make the inference network closer to the true posterior so that the agent can make the best decision in the current environment and further improve the navigation accuracy of the model in different scenarios. The training and testing are conducted on AVD dataset and AI2-THOR scenes, and the experimental results show that the navigation accuracy of the algorithm in this paper is as high as 96. 8%, with an average SR improvement of about 3% and an average SPL improvement of about 6%, which meets the requirements of navigation accuracy and realtime matching.