作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (8): 215-222, 231. doi: 10.19678/j.issn.1000-3428.0064971

• 图形图像处理 • 上一篇    下一篇

结合语义与图像信息的行人属性识别算法

杨祖赫1, 黎智辉2,*, 唐云祁1, 晏于文2, 宋华青2   

  1. 1. 中国人民公安大学 侦查学院, 北京 100038
    2. 公安部物证鉴定中心, 北京 100038
  • 收稿日期:2022-06-13 出版日期:2023-08-15 发布日期:2023-08-15
  • 通讯作者: 黎智辉
  • 作者简介:

    杨祖赫(1998—),男,硕士研究生,主研方向为图像识别、深度学习

    唐云祁,副教授、博士

    晏于文,助理研究员、硕士

    宋华青,助理研究员、博士

  • 基金资助:
    国家重点研发计划(2021YFF0602102); 公安部技术研究计划(2019JSYJA06); 公安部物证鉴定中心基本科研专项(2022JB024)

Pedestrian Attribute Recognition Algorithm Combining Semantic and Image Information

Zuhe YANG1, Zhihui LI2,*, Yunqi TANG1, Yuwen YAN2, Huaqing SONG2   

  1. 1. School of Investigation, People's Public Security University of China, Beijing 100038, China
    2. Institute of Forensic Science of China, Beijing 100038, China
  • Received:2022-06-13 Online:2023-08-15 Published:2023-08-15
  • Contact: Zhihui LI

摘要:

为提升行人属性的识别精度,充分利用行人属性间自然语义关联并解决不同属性相关图像信息的提取差问题,提出结合语义与图像信息的行人属性识别算法。通过自注意力机制的关系建模能力挖掘行人属性间的内在联系,利用交叉注意力机制建立属性间语义信息与图像特征信息的关系。在此基础上,依靠卷积融合图像的高阶与低阶特征并为模块增加局部特征信息,提升模型的泛化能力,通过设计属性预测模块,使模型可与任意骨干网络相拼接,进一步提升识别性能。实验结果显示,该算法的平均精度、准确率、F1值在PA-100K和PETA数据集上分别为84.04%、79.71%、88.03%和89.04%、82.39%、89.06%,与ALM、JLAC等算法相比,能够充分利用属性语义与图像特征信息,在多项评价指标上有明显提升。

关键词: 行人属性识别, 自注意力, 卷积, 特征融合, 多标签分类

Abstract:

To improve the recognition precision of pedestrian attributes and solve the problems of lack of use of natural semantic associations between pedestrian attributes and poor extraction of image information related to different attributes, this study proposes a pedestrian attribute recognition algorithm that combines semantic and image information.First, the relationship modeling ability of self-attention mechanism is utilized to explore the intrinsic relationship between pedestrian attributes, and cross-attention is utilized to establish the relationship between the semantic information between attributes and image feature information. Second, based on convolutional fusing high and low-order features, and adding local feature information into the module, the generalization ability of the model is improved. Owing to the design of the attribute prediction module, the model can be spliced with any backbone network and exhibits good performance.The experimental results show that the mean precision, accuracy, and F1 value of the proposed algorithm on the PA-100K and PETA datasets are 84.04%, 79.71%, 88.03%, and 89.04%, 82.39%, 89.06%, respectively. Compared with existing algorithms such as ALM and JLAC, this algorithm can exploit attribute semantics and image feature information and has a significant improvement in multiple evaluation indicators.

Key words: pedestrian attribute recognition, self-attention, convolution, feature fusion, multi-label classification