Pedestrian Attribute Recognition Algorithm Combining Semantic and Image Information

doi:10.19678/j.issn.1000-3428.0064971

Abstract

Abstract:

To improve the recognition precision of pedestrian attributes and solve the problems of lack of use of natural semantic associations between pedestrian attributes and poor extraction of image information related to different attributes, this study proposes a pedestrian attribute recognition algorithm that combines semantic and image information.First, the relationship modeling ability of self-attention mechanism is utilized to explore the intrinsic relationship between pedestrian attributes, and cross-attention is utilized to establish the relationship between the semantic information between attributes and image feature information. Second, based on convolutional fusing high and low-order features, and adding local feature information into the module, the generalization ability of the model is improved. Owing to the design of the attribute prediction module, the model can be spliced with any backbone network and exhibits good performance.The experimental results show that the mean precision, accuracy, and F1 value of the proposed algorithm on the PA-100K and PETA datasets are 84.04%, 79.71%, 88.03%, and 89.04%, 82.39%, 89.06%, respectively. Compared with existing algorithms such as ALM and JLAC, this algorithm can exploit attribute semantics and image feature information and has a significant improvement in multiple evaluation indicators.

Key words: pedestrian attribute recognition, self-attention, convolution, feature fusion, multi-label classification

摘要：

为提升行人属性的识别精度，充分利用行人属性间自然语义关联并解决不同属性相关图像信息的提取差问题，提出结合语义与图像信息的行人属性识别算法。通过自注意力机制的关系建模能力挖掘行人属性间的内在联系，利用交叉注意力机制建立属性间语义信息与图像特征信息的关系。在此基础上，依靠卷积融合图像的高阶与低阶特征并为模块增加局部特征信息，提升模型的泛化能力，通过设计属性预测模块，使模型可与任意骨干网络相拼接，进一步提升识别性能。实验结果显示，该算法的平均精度、准确率、F1值在PA-100K和PETA数据集上分别为84.04%、79.71%、88.03%和89.04%、82.39%、89.06%，与ALM、JLAC等算法相比，能够充分利用属性语义与图像特征信息，在多项评价指标上有明显提升。

关键词: 行人属性识别, 自注意力, 卷积, 特征融合, 多标签分类

Zuhe YANG, Zhihui LI, Yunqi TANG, Yuwen YAN, Huaqing SONG. Pedestrian Attribute Recognition Algorithm Combining Semantic and Image Information[J]. Computer Engineering, 2023, 49(8): 215-222, 231.

杨祖赫, 黎智辉, 唐云祁, 晏于文, 宋华青. 结合语义与图像信息的行人属性识别算法[J]. 计算机工程, 2023, 49(8): 215-222, 231.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0064971

http://www.ecice06.com/EN/Y2023/V49/I8/215

Figures/Tables 14

References 27

1	王治, 韩祥. 视频结构化解析技术在公安警务实战中的建设与应用. 警察技术, 2018, (5): 63- 66. URL
	WANG Z, HAN X. The construction and application of video structured analysis technology in public security police actual combat. Police Technology, 2018, (5): 63- 66. URL
2	许磊, 李志刚, 黎智辉, 等. 人像检验鉴定探讨. 刑事技术, 2020, 45 (2): 111- 116. URL
	XU L, LI Z G, LI Z H, et al. Cogitation into human image identification. Forensic Science and Technology, 2020, 45 (2): 111- 116. URL
3	黎智辉, 谢兰迟, 吕游, 等. 视频侦查中多摄像头下嫌疑目标同一的概率研究. 刑事技术, 2022, 47 (1): 24- 34. URL
	LI Z H, XIE L C, LÜ Y, et al. Probabilistic approach to identifying same suspected target from multiple cameras in video investigation. Forensic Science and Technology, 2022, 47 (1): 24- 34. URL
4	LI D W, CHEN X T, HUANG K Q. Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios[C]//Proceedings of the 3rd IAPR Asian Conference on Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 111-115.
5	SUDOWE P, SPITZER H, LEIBE B. Person attribute recognition with a jointly-trained holistic CNN model[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2016: 329-337.
6	LI D W, CHEN X T, ZHANG Z, et al. Pose guided deep model for pedestrian attribute recognition in surveillance scenarios[C]//Proceedings of IEEE International Conference on Multimedia and Expo. Washington D. C., USA: IEEE Press, 2018: 1-6.
7	LIU P Z, LIU X H, YAN J J, et al. Localization guided learning for pedestrian attribute recognition[EB/OL]. [2021-11-22]. https://arxiv.org/abs/1808.09102.
8	TANG C F, SHENG L, ZHANG Z X, et al. Improving pedestrian attribute recognition with weakly-supervised multi-scale attribute-specific localization[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2020: 4996-5005.
9	TAN Z C, YANG Y, WAN J, et al. Relation-aware pedestrian attribute recognition with graph convolutional networks. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34 (7): 12055- 12062. doi: 10.1609/aaai.v34i07.6883
10	JIA J, HUANG H, CHEN X, et al. Rethinking of pedestrian attribute recognition: a reliable evaluation under zero-shot pedestrian identity setting[EB/OL]. [2021-11-22]. https://arxiv.org/abs/2107.03576.
11	LIU X H, ZHAO H Y, TIAN M Q, et al. HydraPlus-Net: attentive deep features for pedestrian analysis[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2017: 350-359.
12	JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial Transformer networks[EB/OL]. [2021-11-22]. https://arxiv.org/abs/1506.02025.
13	WANG J Y, ZHU X T, GONG S G, et al. Attribute recognition by joint recurrent learning of context and correlation[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2017: 531-540.
14	KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. [2021-11-22]. https://arxiv.org/abs/1609.02907.
15	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 6000-6010.
16	ZHENG M H, GAO P, ZHANG R R, et al. End-to-end object detection with adaptive clustering Transformer[EB/OL]. [2021-11-22]. https://arxiv.org/abs/2011.09315.
17	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[EB/OL]. [2021-11-22]. https://arxiv.org/abs/2010.11929.
18	LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical Vision Transformer using shifted windows[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2022: 9992-10002.
19	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 770-778.
20	TAN M X, LE Q V. EfficientNet: rethinking model scaling for convolutional neural networks[C]//Proceedings of International Conference on Machine Learning. [S. l. ]: PMLR, 2019: 6105-6114.
21	LIU S L, ZHANG L, YANG X, et al. Query2Label: a simple Transformer way to multi-label classification[EB/OL]. [2021-11-22]. https://arxiv.org/abs/2107.10834v1.
22	TOUVRON H, CORD M, SABLAYROLLES A, et al. Going deeper with image Transformers[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2022: 32-42.
23	BEN-BARUCH E, RIDNIK T, ZAMIR N, et al. Asymmetric loss for multi-label classification[EB/OL]. [2021-11-22]. https://arxiv.org/abs/2009.14119.
24	DENG Y B, LUO P, LOY C C, et al. Pedestrian attribute recognition at far distance[C]//Proceedings of the 22nd ACM International Conference on Multimedia. New York, USA: ACM Press, 2014: 789-792.
25	CUBUK E D, ZOPH B, SHLENS J, et al. RandAugment: practical automated data augmentation with a reduced search space[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 3008-3017.
26	VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE. Journal of Machine Learning Research, 2008, 9 (11): 72- 84.
27	DAI X Y, CHEN Y P, YANG J W, et al. Dynamic DETR: end-to-end object detection with dynamic attention[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2022: 2968-2977.

编码层结构	参数量/10⁶	计算量/10⁹	PETA		PA-100K
编码层结构	参数量/10⁶	计算量/10⁹	平均精度/%	F1值/%	平均精度/%	F1值/%
SA	33.58	2.98	88.86	89.01	83.76	87.69
Conv	29.89	0.73	89.04	89.06	84.04	88.03
All	135.72	16.81	89.04	89.06	84.04	88.03

编码层结构	参数量/10⁶	计算量/10⁹	PETA		PA-100K
编码层结构	参数量/10⁶	计算量/10⁹	平均精度/%	F1值/%	平均精度/%	F1值/%
SA	33.58	2.98	88.86	89.01	83.76	87.69
Conv	29.89	0.73	89.04	89.06	84.04	88.03
All	135.72	16.81	89.04	89.06	84.04	88.03

实验方案	平均精度	准确率	精确率	召回率	F1值
PETA→PA-100K(Before)	59.48	54.38	68.66	70.86	69.74
PA-100K→PETA(Before)	55.96	48.24	61.41	66.63	63.91
PETA→PA-100K(After)	65.57	62.31	74.44	77.53	75.98
PA-100K→PETA(After)	58.36	49.44	60.57	70.16	65.01

实验方案	平均精度	准确率	精确率	召回率	F1值
PETA→PA-100K(Before)	59.48	54.38	68.66	70.86	69.74
PA-100K→PETA(Before)	55.96	48.24	61.41	66.63	63.91
PETA→PA-100K(After)	65.57	62.31	74.44	77.53	75.98
PA-100K→PETA(After)	58.36	49.44	60.57	70.16	65.01

骨干网络	平均精度	准确率	精确率	召回率	F1值
ResNet(baseline)	79.84	73.04	76.89	86.26	81.31
EfficientNet(baseline)	78.45	74.74	81.06	86.50	83.69
ViT(baseline)	80.32	71.10	77.70	85.46	81.39
Swin Transformer(baseline)	82.58	78.54	85.27	89.23	87.21
ResNet	82.31	76.78	81.35	91.80	86.26
EfficientNet	81.52	76.29	84.42	86.40	85.40
ViT	83.21	79.33	87.10	88.21	87.65
Swin Transformer	84.04	79.71	84.74	91.59	88.03

Please choose a citation manager

Content to export