电子测量与仪器学报

杨睿,刘瑞军,师于茜,李善玺.面向智能交互的视觉问答研究综述[J].电子测量与仪器学报,2019,33(2):117-124

面向智能交互的视觉问答研究综述

Survey of visual question answering for intelligent interaction

DOI：

英文关键词:visual question answering image comprehension computer vision natural language processing

基金项目:2018年研究生科研能力提升计划资助项目

作者	单位
杨睿	1.北京工商大学计算机与信息工程学院，2.食品安全大数据技术北京市重点实验室
刘瑞军	1.北京工商大学计算机与信息工程学院，2.食品安全大数据技术北京市重点实验室
师于茜	1.北京工商大学计算机与信息工程学院，2.食品安全大数据技术北京市重点实验室
李善玺	1.北京工商大学计算机与信息工程学院，2.食品安全大数据技术北京市重点实验室

Author	Institution
Yang Rui	1.School of Computer and Information Engineering, Beijing Technology and Business University,2.Beijing Key Laboratory of Big Data Technology for Food Safety
Liu Ruijun	1.School of Computer and Information Engineering, Beijing Technology and Business University,2.Beijing Key Laboratory of Big Data Technology for Food Safety
Shi Yuqian	1.School of Computer and Information Engineering, Beijing Technology and Business University,2.Beijing Key Laboratory of Big Data Technology for Food Safety
Li Shanxi	1.School of Computer and Information Engineering, Beijing Technology and Business University,2.Beijing Key Laboratory of Big Data Technology for Food Safety

摘要点击次数: 1212

全文下载次数: 2

中文摘要:

随着深度学习方法被不断应用于图像处理相关工作，图像相关的智能交互技术也获得了快速发展。面向智能交互的视觉问答技术通过向图像的内容提出相关问题以收集图像信息，最终达到丰富图像理解的目的。通过对近年来视觉问答相关方法进行了综合分析与对比，建设性地将视觉问答方法按照模型结构划分为基本模型、注意力机制模型、模块化模型、基于外部知识库的模型4种类型。同时，还从3个方面针对视觉问答中的视觉和语义信息处理以及未来的视觉推理研究指出了一些方向。

英文摘要:

With the application of deep learning method in the field of image processing, the image related intelligent interaction technology has also been rapidly developed. Visual question answering (VQA) collects the image information by asking questions related to the image and ultimately achieves the purpose for enriching the image understanding. Through comprehensive analysis and comparison of related methods of VQA in recent years, the method can be constructively divided into four types according to the model structure: basic model, attention mechanism related model, modular model and external knowledge base model. At the same time, it also points out directions for visual and semantic information processing and future research on visual reasoning in VQA from three aspects.

查看全文查看/发表评论下载PDF阅读器