1 |
REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 779-788.
|
2 |
LIU W , ANGUELOV D , ERHAN D , et al. SSD: single shot multibox detector. Berlin, Germany: Springer International Publishing, 2016.
|
3 |
REN S Q , HE K M , GIRSHICK R , et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (6): 1137- 1149.
doi: 10.1109/TPAMI.2016.2577031
|
4 |
HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2017: 2980-2988.
|
5 |
|
6 |
CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer International Publishing, 2020: 213-229.
|
7 |
LIU L , OUYANG W L , WANG X G , et al. Deep learning for generic object detection: a survey. International Journal of Computer Vision, 2020, 128 (2): 261- 318.
doi: 10.1007/s11263-019-01247-4
|
8 |
ZAIDI S S A , ANSARI M S , ASLAM A , et al. A survey of modern deep learning based object detection models. Digital Signal Processing, 2022, 126, 103514.
doi: 10.1016/j.dsp.2022.103514
|
9 |
ARKIN E, YADIKAR N, MUHTAR Y, et al. A survey of object detection based on CNN and transformer[C]//Proceedings of the 2nd International Conference on Pattern Recognition and Machine Learning (PRML). Washington D.C., USA: IEEE Press, 2021: 99-108.
|
10 |
KHAN S , NASEER M , HAYAT M , et al. Transformers in vision: a survey. ACM Computing Surveys, 2022, 54 (10): 1- 41.
doi: 10.1145/3505244
|
11 |
ARKIN E , YADIKAR N , XU X B , et al. A survey: object detection methods from CNN to transformer. Multimedia Tools and Applications, 2023, 82 (14): 21353- 21383.
doi: 10.1007/s11042-022-13801-3
|
12 |
CHAUDHARI S , MITHAL V , POLATKAN G , et al. An attentive survey of attention models. ACM Transactions on Intelligent Systems and Technology, 2021, 12 (5): 1- 32.
doi: 10.1145/3465055
|
13 |
李清格, 杨小冈, 卢瑞涛, 等. 计算机视觉中的Transformer发展综述. 小型微型计算机系统, 2023, 44 (4): 850- 861.
URL
|
|
LI Q G , YANG X G , LU R T , et al. Transformer in computer vision: a survey. Journal of Chinese Computer Systems, 2023, 44 (4): 850- 861.
URL
|
14 |
田永林, 王雨桐, 王建功, 等. 视觉Transformer研究的关键问题: 现状及展望. 自动化学报, 2022, 48 (4): 957- 979.
doi: 10.16383/j.aas.c220027
|
|
TIAN Y L , WANG Y T , WANG J G , et al. Key issues in visual Transformer research: current status and prospects. Acta Automatica Sinica, 2022, 48 (4): 957- 979.
doi: 10.16383/j.aas.c220027
|
15 |
李建, 杜建强, 朱彦陈, 等. 基于Transformer的目标检测算法综述. 计算机工程与应用, 2023, 59 (10): 48- 64.
doi: 10.3778/j.issn.1002-8331.2211-0133
|
|
LI J , DU J Q , ZHU Y C , et al. Survey of Transformer-based object detection algorithms. Computer Engineering and Applications, 2023, 59 (10): 48- 64.
doi: 10.3778/j.issn.1002-8331.2211-0133
|
16 |
刘宇晶. 基于Transformer的目标检测研究综述. 计算机时代, 2023, (5): 6- 10.
doi: 10.19850/j.cnki.2096-4706.2021.07.004
|
|
LIU Y J . Summary of research on target detection based on Transformer. Computer Era, 2023, (5): 6- 10.
doi: 10.19850/j.cnki.2096-4706.2021.07.004
|
17 |
EVERINGHAM M , VAN GOOL L , WILLIAMS C K I , et al. The pascal Visual Object Classes (VOC) challenge. International Journal of Computer Vision, 2010, 88 (2): 303- 338.
doi: 10.1007/s11263-009-0275-4
|
18 |
|
19 |
DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2009: 248-255.
|
20 |
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 770-778.
|
21 |
ZHENG S X, LU J C, ZHAO H S, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with Transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2021: 6877-6886.
|
22 |
XIE E Z, WANG W H, YU Z D, et al. SegFormer: simple and efficient design for semantic segmentation with Transformers[EB/OL]. [2023-09-15]. http://arxiv.org/abs/2105.15203.
|
23 |
|
24 |
ZHANG H, XU T, LI H S, et al. StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2017: 5908-5916.
|
25 |
ZHANG H , XU T , LI H S , et al. StackGAN++: realistic image synthesis with stacked generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41 (8): 1947- 1962.
doi: 10.1109/TPAMI.2018.2856256
|
26 |
GOODFELLOW I , POUGET-ABADIE J , MIRZA M , et al. Generative adversarial networks. Communications of the ACM, 2020, 63 (11): 139- 144.
doi: 10.1145/3422622
|
27 |
CHEN M, RADFORD A, CHILD R, et al. Generative pretraining from pixels[C]//Proceedings of the International Conference on Machine Learning. Washington D.C., USA: IEEE Press, 2020: 1691-1703.
|
28 |
ESSER P, ROMBACH R, OMMER B. Taming Transformers for high-resolution image synthesis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2021: 12868-12878.
|
29 |
JIANG Y F, CHANG S Y, WANG Z Y. TransGAN: two pure Transformers can make one strong GAN, and that can scale up[EB/OL]. [2023-09-15]. http://arxiv.org/abs/2102.07074.
|
30 |
|
31 |
|
32 |
LIU S, FAN H Q, QIAN S S, et al. HiT: hierarchical Transformer with momentum contrast for video-text retrieval[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 11895-11905.
|
33 |
LIN K, LI L J, LIN C C, et al. SwinBERT: end-to-end Transformers with sparse attention for video captioning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 17928-17937.
|
34 |
GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2014: 580-587.
|
35 |
DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2017: 764-773.
|
36 |
ZHU X Z, SU W J, LU L W, et al. Deformable DETR: deformable Transformers for end-to-end object detection[EB/OL]. [2023-09-15]. http://arxiv.org/abs/2010.04159.
|
37 |
DAI X Y, CHEN Y P, YANG J W, et al. Dynamic DETR: end-to-end object detection with dynamic attention[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 2968-2977.
|
38 |
YAO Z Y, AI J B, LI B X, et al. Efficient DETR: improving end-to-end object detector with dense prior[EB/OL]. [2023-09-15]. http://arxiv.org/abs/2104.01318.
|
39 |
GAO P, ZHENG M H, WANG X G, et al. Fast convergence of DETR with spatially modulated co-attention[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 3601-3610.
|
40 |
ROH B, SHIN J, SHIN W, et al. Sparse DETR: efficient end-to-end object detection with learnable sparsity[EB/OL]. [2023-09-15]. http://arxiv.org/abs/2111.14330.
|
41 |
|
42 |
LI F, ZENG A L, LIU S L, et al. Lite DETR: an interleaved multi-scale encoder for efficient DETR[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 18558-18567.
|
43 |
ZHANG G J, LUO Z P, YU Y C, et al. Accelerating DETR convergence via semantic-aligned matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 939-948.
|
44 |
CAO X P, YUAN P, FENG B L, et al. CF-DETR: coarse-to-fine Transformers for end-to-end object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2022: 185-193.
|
45 |
|
46 |
MENG D P, CHEN X K, FAN Z J, et al. Conditional DETR for fast training convergence[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 3631-3640.
|
47 |
ZHANG H, LI F, LIU S L, et al. DINO: DETR with improved denoising anchor boxes for end-to-end object detection[EB/OL]. [2023-09-15]. http://arxiv.org/abs/2203.03605.
|
48 |
DAI Z G, CAI B L, LIN Y G, et al. UP-DETR: unsupervised pre-training for object detection with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2021: 1601-1610.
|
49 |
LI F, ZHANG H, LIU S L, et al. DN-DETR: accelerate DETR training by introducing query DeNoising[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 13609-13617.
|
50 |
CHEN Q, CHEN X K, WANG J, et al. Group DETR: fast DETR training with group-wise one-to-many assignment[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2023: 6610-6619.
|