基于ViT的细粒度特征增强无监督行人重识别方法
DOI:
CSTR:
作者:
作者单位:

江南大学轻工过程先进控制教育部重点实验室无锡214122

作者简介:

通讯作者:

中图分类号:

TP391.4; TN911.7

基金项目:

国家自然科学基金(62173160)项目资助


Fine-grained feature enhancement unsupervised person re-identification method based on ViT
Author:
Affiliation:

Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    行人重识别任务可以看做是细粒度视觉分类任务的一种。现有的无监督行人重识别方法通常只关注人体全局特征,不能获取准确的细粒度局部特征,进而影响模型的识别精度。为解决这一问题,提出了一种基于ViT的细粒度特征增强网络,该网络利用视觉语言模型生成图像中人体局部区域的掩码,根据自注意力机制中可学习标记与图像块之间交互策略的不同,使类标记与引入的可学习变量局部标记分别学习全局与局部细粒度特征表示。此外,为进一步提升特征表示能力,设计了一个空间信息增强模块,该模块通过挖掘人体局部区域内代表性图像块之间的空间上下文关系来增强特征学习。最后,利用提取到的全局和局部细粒度特征,分别计算在线和离线相机感知对比损失,以增强模型在无监督环境下对于行人身份的鲁棒性。在Market-1501、MSMT17和PersonX数据集上的实验结果验证了所提方法的有效性,mAP/Rank-1分别达到了90.3%/95.9%、59.2%/83.5%、91.3%/96.1%。

    Abstract:

    Person re-identification can be regarded as a form of fine-grained visual classification task. Existing unsupervised person Re-ID methods typically focus solely on global features of human bodies, failing to capture accurate fine-grained local features, thereby hindering the recognition accuracy of the models. To address this issue, we propose a ViT-based fine-grained feature enhancement network. This network leverages a vision-language model to generate masks for local regions of human bodies in images. Subsequently, based on the distinct interaction strategies between learnable tokens and image patches within the self-attention mechanism, the class token and introduced learnable local tokens are utilized to learn global and local fine-grained feature representations, respectively. Furthermore, to further enhance feature representation capabilities, a spatial information enhancement module is designed. This module augments feature learning by mining spatial contextual relationships among representative image patches within local regions of human bodies. Finally, utilizing the extracted global and local fine-grained features, online and offline camera-aware contrastive losses are computed separately to bolster the model’s robustness to person identities in an unsupervised environment. Experimental results on the Market-1501, MSMT17, and PersonX datasets validate the effectiveness of the proposed method, achieving mAP/Rank-1 accuracies of 90.3%/95.9%, 59.2%/83.5%, and 91.3%/96.1%, respectively.

    参考文献
    相似文献
    引证文献
引用本文

程思雨,陈莹.基于ViT的细粒度特征增强无监督行人重识别方法[J].电子测量与仪器学报,2024,38(9):24-35

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-12-02
  • 出版日期:
文章二维码
×
《电子测量与仪器学报》
财务封账不开票通知