电子测量与仪器学报

李国燕,田明达,董春华,郝志鹏.面向遥感图像的结构化图像描述网络[J].电子测量与仪器学报,2024,38(5):148-157

面向遥感图像的结构化图像描述网络

Structured image description network for remote sensing images

DOI：

英文关键词:remote sensing image caption geo-relation semantic segmentation attention mechanism

基金项目:国家自然科学基金(52178295)项目资助

作者	单位
李国燕	天津城建大学计算机与信息工程学院天津300384
田明达	天津城建大学计算机与信息工程学院天津300384
董春华	天津城建大学地质与测绘学院天津300384
郝志鹏	天津城建大学计算机与信息工程学院天津300384

Author	Institution
Li Guoyan	School of Computer and Information Engineering, Tianjin Chengjian University, Tianjin 300384,China
Tian Mingda	School of Computer and Information Engineering, Tianjin Chengjian University, Tianjin 300384,China
Dong Chunhua	School of Geology and Surveying, Tianjin Chengjian University, Tianjin 300384,China
Hao Zhipeng	School of Computer and Information Engineering, Tianjin Chengjian University, Tianjin 300384,China

摘要点击次数: 2

全文下载次数: 272

中文摘要:

为了解决标准注意力方法只能生成粗粒度的注意力区域，既无法获取遥感对象之间的地理关系，也不能充分利用遥感图像语义内容的问题，提出了一种面向遥感图像的结构化图像描述网络（geo object relational segmentation for remote sensing image captioning,GRSRC）。首先，针对遥感图像特征高度结构化的特点，提出基于结构化遥感图像语义分割的特征提取方法，通过增强编码器特征提取能力实现更准确的表达；同时，引入注意力机制对分割区域进行加权，使模型能够更加关注重要的语义信息；其次，针对遥感图像空间对象位置关系较为明确的特点，在注意力机制中融合地理空间关系，使生成的描述更加准确且具有空间一致性；最后，在RSICD、UCM、Sydney 3个公开的遥感数据集上进行实验评估，在UCM数据集上，BLEU-1达到了84.06、METEOR达到了44.35、ROUGE_L达到了77.01，相较于所对比的经典模型，分别提升了2.32%，1.15%和1.88%。实验结果说明模型能够更充分利用遥感图像语义内容，表明了该方法在遥感图像描述任务中具有较好的性能。

英文摘要:

To address the limitation of standard attention mechanisms that can only generate coarse-grained attention regions, failing to capture the geographical relationships between remote sensing objects and underutilize the semantic content of remote sensing images, a structured image description network named GRSRC (geo-object relational segmentation for remote sensing image captioning) is proposed. Firstly, considering the highly structured nature of remote sensing image features, a feature extraction method based on structured semantic segmentation of remote sensing images is introduced, enhancing the encoder’s feature extraction capability for more accurate representation. Simultaneously, an attention mechanism is incorporated to weight the segmented regions, enabling the model to focus more on crucial semantic information. Secondly, taking advantage of the well-defined spatial relationships among objects in remote sensing images, geographical spatial relations are integrated into the attention mechanism, ensuring more accurate and spatially consistent descriptions. Finally, experimental evaluations are conducted on three publicly available remote sensing datasets, RSICD, UCM, and Sydney. On the UCM dataset, BLEU-1 achieved 84.06, METEOR reached 44.35, and ROUGE_L attained 77.01, demonstrating improvements of 2.32%, 1.15%, and 1.88%, respectively, compared to classical models. The experimental results indicate that the model can better leverage the semantic content of remote sensing images, demonstrating its superior performance in remote sensing image captioning tasks.

查看全文查看/发表评论下载PDF阅读器