Survey of visual question answering for intelligent interaction
中文关键词:  视觉问答  图像理解  计算机视觉  自然语言处理
英文关键词:visual question answering  image comprehension  computer vision  natural language processing
杨睿 1.北京工商大学计算机与信息工程学院,2.食品安全大数据技术北京市重点实验室 
刘瑞军 1.北京工商大学计算机与信息工程学院,2.食品安全大数据技术北京市重点实验室 
师于茜 1.北京工商大学计算机与信息工程学院,2.食品安全大数据技术北京市重点实验室 
李善玺 1.北京工商大学计算机与信息工程学院,2.食品安全大数据技术北京市重点实验室 
Yang Rui 1.School of Computer and Information Engineering, Beijing Technology and Business University,2.Beijing Key Laboratory of Big Data Technology for Food Safety 
Liu Ruijun 1.School of Computer and Information Engineering, Beijing Technology and Business University,2.Beijing Key Laboratory of Big Data Technology for Food Safety 
Shi Yuqian 1.School of Computer and Information Engineering, Beijing Technology and Business University,2.Beijing Key Laboratory of Big Data Technology for Food Safety 
Li Shanxi 1.School of Computer and Information Engineering, Beijing Technology and Business University,2.Beijing Key Laboratory of Big Data Technology for Food Safety 
摘要点击次数: 351
全文下载次数: 2
      With the application of deep learning method in the field of image processing, the image related intelligent interaction technology has also been rapidly developed. Visual question answering (VQA) collects the image information by asking questions related to the image and ultimately achieves the purpose for enriching the image understanding. Through comprehensive analysis and comparison of related methods of VQA in recent years, the method can be constructively divided into four types according to the model structure: basic model, attention mechanism related model, modular model and external knowledge base model. At the same time, it also points out directions for visual and semantic information processing and future research on visual reasoning in VQA from three aspects.
查看全文  查看/发表评论  下载PDF阅读器