[1] TIAN B, YAO Q, GU Y, et al.Video processing techniques for traffic flow monitoring:a survey[C]//Proceedings of the 14th International IEEE Conference on Intelligent Transportation Systems.Washington D.C., USA:IEEE Press, 2011:1103-1108. [2] YILMAZ A, JAVED O, SHAH M.Object tracking:a survey[J].ACM Computing Surveys, 2006, 38(4):1-45. [3] VALMADRE J, BERTINETTO L, HENRIQUES J, et al.End-to-end representation learning for correlation filter based tracking[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washingston D.C., USA:IEEE Press, 2017:2805-2813. [4] COMANICIU D, RAMESH V, MEER P.Real-time tracking of non-rigid objects using mean shift[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2000, 2:142-149. [5] BOLME D S, BEVERIDGE J R, DRAPER B A, et al.Visual object tracking using adaptive correlation filters[C]//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2010:2544-2550. [6] HENRIQUES J F, CASEIRO R, MARTINS P, et al.Exploiting the circulant structure of tracking-by-detection with kernels[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2012:702-715. [7] HENRIQUES J F, CASEIRO R, MARTINS P, et al.High-speed tracking with kernelized correlation filters[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3):583-596. [8] LI Y, ZHU J.A scale adaptive kernel correlation filter tracker with feature integration[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2014:254-265. [9] DANELLJAN M, HÄGER G, KHAN F, et al.Accurate scale estimation for robust visual tracking[C]//Proceedings of British Machine Vision Conference.Nottingham, UK:[s.n], 2014:1-11. [10] DANELLJAN M, HAGER G, SHAHBAZ KHAN F, et al.Learning spatially regularized correlation filters for visual tracking[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2015:4310-4318. [11] GALOOGAHI H K, FAGG A, LUCEY S.Learning background-aware correlation filters for visual tracking[C]//Proceedings of IEEE International Conference on Computer Vision.Washingston D.C., USA:IEEE Press, 2017:1135-1143. [12] LI F, TIAN C, ZUO W, et al.Learning spatial-temporal regularized correlation filters for visual tracking[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:4904-4913. [13] KRIZHEVSKY A, SUTSKEVER I, HINTON G E.Imagenet classification with deep convolutional neural networks[J].Communications of the ACM, 2017, 60(6):84-90. [14] BERTINETTO L, VALMADRE J, HENRIQUES J F, et al.Fully-convolutional siamese networks for object tracking[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2016:850-865. [15] LI B, YAN J, WU W, et al.High performance visual tracking with siamese region proposal network[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:8971-8980. [16] ZHU Z, WANG Q, LI B, et al.Distractor-aware siamese networks for visual object tracking[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2018:101-117. [17] ZHANG Z, PENG H.Deeper and wider siamese networks for real-time visual tracking[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:4591-4600. [18] HE K, ZHANG X, REN S, et al.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:770-778. [19] SIMONYAN K, ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].[2020-10-15].http://arxiv.org/pdf/1409.1556.pdf. [20] XIE S, GIRSHICK R, DOLLÁR P, et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:1492-1500. [21] LI B, WU W, WANG Q, et al.Siamrpn++:evolution of siamese visual tracking with very deep networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:4282-4291. [22] FAN H, LING H.Siamese cascaded region proposal networks for real-time visual tracking[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:7952-7961. [23] RUSSAKOVSKY O, DENG J, SU H, et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision, 2015, 115(3):211-252. [24] HU H, GU J, ZHANG Z, et al.Relation networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:3588-3597. [25] HU J, SHEN L, SUN G.Squeeze-and-excitation networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141. [26] WANG X, GIRSHICK R, GUPTA A, et al.Non-local neural networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:7794-7803. [27] VASWANI A, SHAZEER N, PARMAR N, et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.New York, USA:ACM Press, 2017:5998-6008. [28] CAO Y, XU J, LIN S, et al.GCNet:non-local networks meet squeeze-excitation networks and beyon[C]//Proceedings of IEEE/CVF International Conference on Computer Vision Workshop.Washington D.C., USA:IEEE Press, 2019:1-11. [29] HOU Q, ZHANG L, CHENG M M, et al.Strip pooling:rethinking spatial pooling for scene parsing[EB/OL].[2020-10-16].https://arxiv.org/abs/2003.13328v1. [30] REZATOFIGHI H, TSOI N, GWAK J Y, et al.Generalized intersection over union:a metric and a loss for bounding box regression[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:658-666. [31] ZHENG Z, WANG P, LIU W, et al.Distance-iou loss:faster and better learning for bounding box regression[EB/OL].[2020-10-15].https://arxiv.org/abs/1911.08287. [32] IOFFE S, SZEGEDY C.Batch normalization:accelerating deep network training by reducing internal covariate shift[EB/OL].[2020-10-10].https://arxiv.org/abs/1502.03167. [33] REAL E, SHLENS J, MAZZOCCHI S, et al.Youtube-boundingboxes:a large high-precision human-annotated data set for object detection in video[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:5296-5305. [34] MUELLER M, SMITH N, GHANEM B.A benchmark and simulator for uav tracking[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2016:445-461. [35] Jia X, Lu H, Yang M H.Visual tracking via adaptive structural local sparse appearance model[C]//Proceedings of 2012 IEEE Conference on computer vision and pattern recognition.Washington D.C., USA:IEEE Press, 2012:1822-1829. [36] HONG Z, CHEN Z, WANG C, et al.Multi-store tracker (muster):a cognitive psychology inspired approach to object tracking[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2015:749-758. [37] ROSS D A, LIM J, LIN R S, et al.Incremental learning for robust visual tracking[J].International Journal of Computer Vision, 2008, 77(1-3):125-141. |