作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (4): 293-302. doi: 10.19678/j.issn.1000-3428.0068921

• 开发研究与工程应用 • 上一篇    下一篇

基于CLIP增强细粒度特征的换装行人重识别方法

耿霞, 汪尧*()   

  1. 江苏大学计算机科学与通信工程学院, 江苏 镇江 212000
  • 收稿日期:2023-11-29 出版日期:2025-04-15 发布日期:2025-04-18
  • 通讯作者: 汪尧

Cloth-Changing Person Re-Identification Method Based on CLIP Enhanced Fine-Grained Features

GENG Xia, WANG Yao*()   

  1. School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212000, Jiangsu, China
  • Received:2023-11-29 Online:2025-04-15 Published:2025-04-18
  • Contact: WANG Yao

摘要:

换装行人重识别旨在检索穿着不同服装的目标行人。现有方法通过引入额外信息(如轮廓、步态、3D信息)辅助学习服装无关特征。但受光照、姿态变化等因素的影响, 提取的生物特征可能存在误差。为提高准确性, 探索对比语言-图像预训练(CLIP)在该任务的应用, 提出CLIP驱动的细粒度特征增强方法(CFFE)。首先建模CLIP提取的类文本特征和图像特征的潜在内在联系, 然后引入显著性特征保留模块和显著性特征引导模块。显著性特征保留模块利用注意力掩码定位服装相关的前景区域, 进而擦除该部分特征, 使网络关注有效的非服装特征, 显著性特征引导模块通过注意力机制进一步关注行人的重要局部和全局特征。实验结果表明, 该方法在LTCC、PRCC和VC-Clothes数据集上的检测精度分别达到42.1%、71.1%和89.9%, 与AIM、CAL等算法相比, 能够提取到更细粒度的特征, 在多项指标上有明显提升。

关键词: 换装行人重识别, 对比语言-图像预训练, 特征保留策略, 注意力机制, 语义解析

Abstract:

Cloth-Changing person Re-Identification (CC-ReID) aims to identify target pedestrians wearing different outfits. Existing methods incorporate additional information (such as contours, gait, and 3D information) to assist the model in learning the clothing-agnostic features of pedestrians. However, owing to factors such as lighting and pose variations, the extracted biometric features may contain errors. To enhance accuracy, this paper explores the application of Contrastive Language-Image Pre-training (CLIP) and proposes CLIP-driven Fine-grained Feature Enhancement (CFFE) for CC-ReID. This method first models the potential intrinsic relationship between the class text features and image features extracted by CLIP. Subsequently, it uses a salient feature retention module and a saliency feature guiding module. The saliency feature retention module utilizes attention masks to locate foreground regions relevant to clothing and erases these features to ensure that the network focus on effective non-clothing features. Next, the saliency feature guidance module focuses on the important local and global features of pedestrians through attention mechanisms. The CFFE method achieves detection accuracies of 42.1%, 71.1%, and 89.9% on the LTCC, PRCC, and VC-Clothes datasets, respectively. Compared with algorithms such as AIM and CAL, CFFE extracts more robust features, showing significant improvements across multiple metrics.

Key words: Cloth-Changing person Re-Identification(CC-ReID), Contrastive Language-Image Pre-training(CLIP), feature retention strategy, attention mechanisms, semantic parsing