Abstract:With the application of deep learning method in the field of image processing, the image related intelligent interaction technology has also been rapidly developed. Visual question answering (VQA) collects the image information by asking questions related to the image and ultimately achieves the purpose for enriching the image understanding. Through comprehensive analysis and comparison of related methods of VQA in recent years, the method can be constructively divided into four types according to the model structure: basic model, attention mechanism related model, modular model and external knowledge base model. At the same time, it also points out directions for visual and semantic information processing and future research on visual reasoning in VQA from three aspects.