Person Re-Identification in Video Based on Spatial-Temporal Attention Region

doi:10.19678/j.issn.1000-3428.0057892

Abstract

Abstract: When performing person re-identification task for videos, the traditional local-based methods mainly focus on learning local feature representations in regions with specific predefined semantics, and their learning efficiency and robustness is reduced in complex scenes.This paper combines global and local features to propose a person re-identification method in video based on spatio-temporal attention regions.The feature of attention regions of cross-frame aggregation are fused with the global feature to obtain the video-level feature representation.Then two paths of SlowFast network are used to extract global features and attention region features.In the fast path, the multiple spatial attention model extracts the attention region features, and the attention region features of the same part of all sampling frames are aggregated by the temporal aggregation model.In the slow path, global features are extracted by Convolutional Neural Network(CNN).On this basis, the affinity matrix and the location parameter are used to integrate the attention region feature and the global feature.The average Euclidean distance is used to evaluate the fusion loss, and the triplet loss function is used for end-to-end network training.The experimental results show that the accuracy of this method reaches 93.4% on PRID 2011 data set and 79.5% on mAP on MARS data set, which demonstrates its recognition performance advantage over SeeForst、ASTPN、RQEN and other methods.In addition, it shows excellent robustness to illumination, person pose changes and occlusion.

Key words: person re-identification, attention region, temporal aggregation, global feature, feature fusion

摘要： 在执行视频行人重识别任务时，传统基于局部的方法主要集中于具有特定预定义语义的区域学习局部特征表示，在复杂场景下的学习效率和鲁棒性较差。通过结合全局特征和局部特征提出一种基于时空关注区域的视频行人重识别方法。将跨帧聚合的关注区域特征与全局特征进行融合得到视频级特征表示，利用快慢网络中的两个路径分别提取全局特征和关注区域特征。在快路径中，利用多重空间关注模型提取关注区域特征，利用时间聚合模型聚合所有采样帧相同部位的关注区域特征。在慢路径中，利用卷积神经网络提取全局特征。在此基础上，使用亲和度矩阵和定位参数融合关注区域特征和全局特征。以平均欧氏距离评估融合损失，并将三重损失函数用于端到端网络训练。实验结果表明，该方法在PRID 2011数据集上Rank-1准确率达到93.4%，在MARS数据集上mAP达到79.5%，识别性能优于SeeForst、ASTPN、RQEN等方法，并且对光照、行人姿态变化和遮挡具有很好的鲁棒性。

关键词: 行人重识别, 关注区域, 时间聚合, 全局特征, 特征融合

CLC Number:

TP391

HU Xiaoqiang, WEI Dan, WANG Ziyang, SHEN Jianglin, REN Hongjuan. Person Re-Identification in Video Based on Spatial-Temporal Attention Region[J]. Computer Engineering, 2021, 47(6): 277-283.

胡晓强, 魏丹, 王子阳, 沈江霖, 任洪娟. 基于时空关注区域的视频行人重识别[J]. 计算机工程, 2021, 47(6): 277-283.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0057892

http://www.ecice06.com/EN/Y2021/V47/I6/277

Figures/Tables 9

References

[1] SHU Chang,DING Xiaoqing,FANG Chi.Face recognition method of multiple features local and global fusion[J].Computer Engineering,2011,37(19):145-147,156.(in Chinese)舒畅,丁晓青,方驰.多特征局部与全局融合的人脸识别方法[J].计算机工程,2011,37(19):145-147,156.
[2] KU Haohua,ZHOU Ping,CAI Xiaodong,et al.Person re-identification method based on regional feature alignment and k-reciprocal encoding[J].Computer Engineering,2019,45(3):207-211.(in Chinese)库浩华,周萍,蔡晓东,等.基于区域特征对齐与k倒排编码的行人再识别方法[J].计算机工程,2019,45(3):207-211.
[3] HUANG Cundong,LIU Renjin,YANG Sichun.Video face recognition based on feature fusion and manifold enhancement[J].Computer Engineering,2012,38(9):193-196.(in Chinese)黄存东,刘仁金,杨思春.基于特征融合和流形增强的视频人脸识别[J].计算机工程,2012,38(9):193-196.
[4] LIU Hao,FENG Jiashi,QI Meibin,et al.End-to-end com-parative attention networks for person re-identification[J].IEEE Transactions on Image Processing,2017,26(7):3492-3506.
[5] SUN Yifan,ZHENG Liang,YANG Yang,et al.Beyond part models:person retrieval with refind part pooling(and a strong convolutional baseline)[C]//Proceedings of 2018 European Conference on Computer Vision.Berlin,Germany:Springer,2018:501-518.
[6] WU Lin,WANG Yang,SHAO Ling,et al.3D PersonVLAD:learning deep global representations for video-based person re-identification[J].IEEE Transactions on Neural Networks and Learning Systems,2019,30(11):3347-3359.
[7] CHEN Guangyi,LU Jiwen,YANG Ming,et al.Spatial-temporal attention-aware learning for video-based person re-identification[J].IEEE Transactions on Image Processing,2019,28(9):4192-4205.
[8] ZHANG Dongyu,WU Wenxi,CHENG Hui,et al.Image-to-video person re-identification with temporally memorized similarity learning[J].IEEE Transactions on Circuits & Systems for Video Technology,2017,28(10):2622-2632.
[9] LIU Feng,CHEN Zhigang,WANG Jie.Video image target monitoring based on RNN-LSTM[J].Multimedia Tools & Applications,2018,70(4):4527-4544.
[10] SONG Guanglu,LENG Biao,LIU Yu,et al.Region-based quality estimation network for large-scale person re-identification[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:7347-7354.
[11] LIAO R,CAO C,GARCIA E,et al.Pose-based Temporal-Spatial Network (PTSN) for gait recognition with carrying and clothing variations[C]//Proceedings of Chinese Conference on Biometric Recognition.Berlin,Germany:Springer,2017:474-483.
[12] LIU Yiheng,YUAN Zhenxun,ZHOU Wengang,et al.Spatial and temporal mutual promotion for video-based person re-identification[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence.Washington D.C.,USA:IEEE Press,2019:8786-8793.
[13] GAO J,NEVATIA R.Revisiting temporal modeling for video-based person reid[C]//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2018:1-11.
[14] TAO Fei,CHENG Keyang,ZHANG Jianming,et al.Pedestrian reidentification method based on posture and parallel attribute learning[J].Computer Engineering,2020,46(3):246-253.(in Chinese)陶飞,成科扬,张建明,等.基于姿态与并行化属性学习的行人再识别方法[J].计算机工程,2020,46(3):246-253.
[15] FEICHTENHOFER C,FAN H,MALIK J,et al.SlowFast networks for video recognition[C]//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2018:6201-6210.
[16] SHUANG L,BAK S,CARR P,et al.Diversity regularized spatiotemporal attention for video-based person re-identification[C]//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2018:369-378.
[17] JADERBERG M,SIMONYAN K,ZISSERMAN A,et al.Spatial transformer networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2015:2017-2025.
[18] HERMANS A,BEYER L,LEIBE B.In defense of the triplet loss for person re-identification[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:1-15.
[19] ZHEN Zhou,YAN Huang,WEI Wei,et al.See the forest for the trees:joint spatial and temporal recurrent neural networks for video-based person re-identification[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:6776-6785.
[20] XU Shuangjie,CHENG Yu,GU Kang,et al.Jointly attentive spatial-temporal pooling networks for video-based person re-identification[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:4743-4752.
[21] ZHENG Liang,BIE Zhi,SUN Yifan,et al.MARS:a video benchmark for large-scale person re-identification[C]//Proceedings of 2016 European Conference on Computer Vision.Berlin,Germany:Springer,2016:868-884.
[22] LIU H,JIE Z,JAYASHREE K,et al.Video-based person re-identification with accumulative motion context[J].IEEE Transactions on Circuits & Systems for Video Technology,2018,28(10):2788-2802.

Please choose a citation manager

Content to export