刘 铁,段 勇.融合 CNN 和 Transformer 的机器人室内场景识别[J].电子测量与仪器学报,2023,37(5):223-229 |
融合 CNN 和 Transformer 的机器人室内场景识别 |
Robot indoor scene recognition based on fusion of CNN and Transformer |
|
DOI: |
中文关键词: CNN Transformer 机器人 场景识别 局部特征 |
英文关键词:CNN Transformer robot scene recognition local feature |
基金项目:辽宁省高等学校优秀科技人才支持计划(LR15045)、辽宁省教育厅科学研究经费面上项目(LJKZ0139)资助 |
|
|
摘要点击次数: 1194 |
全文下载次数: 2511 |
中文摘要: |
为了提高机器人在复杂的室内环境中场景识别的准确率,本文提出一种融合卷积神经网络( convolutional neural
network,CNN)和视觉 Transformer 结构的机器人室内场景识别模型。 本文模型利用 CNN 提取场景局部特征,然后使用视觉
Transformer 结构捕捉特征中远距离依赖关系,其中提出的视觉 Transformer 结构包括 3 个部分,分别是特征编码结构(Attention
Embedding)、Encoder 结构和一个将高层语义特征转化成像素级特征的结构(Attention Project)。 本文研究的机器人场景识别模
型利用 CNN 提高视觉 Transformer 局部细节特征的描述能力,同时通过视觉 Transformer 帮助 CNN 构建远距离特征的依赖关系,
从而能够有效的表征和利用机器人工作场景图像的视觉特征。 最后,通过机器人在实际工作环境中采集的数据集和开源的
COLD 数据集进行实验,验证了本文研究模型的有效性,场景识别精度更高。 |
英文摘要: |
In order to improve the accuracy of robot scene recognition in complex indoor environments, this paper proposes a robot scene
recognition model that fuses convolutional neural network (CNN) and visual Transformer structure. The model uses CNN to extract local
features of the scene. And the visual Transformer structure is used to capture the distant dependencies in the features. The proposed
visual Transformer structure consists of three parts, they are a feature encoding structure (Attention Embedding), an Encoder structure,
and a structure that converts high-level semantic features into pixel-level features (Attention Project). The robot scene recognition model
studied in this paper uses CNN to improve the description ability of local detail features of the visual Transformer. Furthermore, the
visual Transformer helps CNN to construct the dependencies of distant features, which can effectively characterize and utilize the visual
features of the robot working scene images. Finally, the effectiveness of the model is verified by experimenting with the dataset collected
by the robot in the actual working environment and the open source COLD dataset. The scene recognition accuracy of our model is
higher. |
查看全文 查看/发表评论 下载PDF阅读器 |
|
|
|