| 1 |
KRIZHEVSKY A , SUTSKEVER I , HINTON G E . ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60 (6): 84- 90.
doi: 10.1145/3065386
|
| 2 |
REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 779-788.
|
| 3 |
LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of ECCV'16. Berlin, Germany: Springer International Publishing, 2016: 21-37.
|
| 4 |
|
| 5 |
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[EB/OL]. [2024-01-14]. https://arxiv.org/abs/2010.11929.
|
| 6 |
LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical vision Transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 9992-10002.
|
| 7 |
CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with Transformers[C]//Proceedings of ECCV'20. Berlin, Germany: Springer International Publishing, 2020: 213-229.
|
| 8 |
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 770-778.
|
| 9 |
LIU Y D, WANG Y T, WANG S W, et al. CBNet: a novel composite backbone network architecture for object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2020: 11653-11660.
|
| 10 |
SUN Z Q, CAO S C, YANG Y M, et al. Rethinking Transformer-based set prediction for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 3591-3600.
|
| 11 |
GAO P, ZHENG M H, WANG X G, et al. Fast convergence of DETR with spatially modulated co-attention[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 3601-3610.
|
| 12 |
YE M Q, KE L, LI S Y, et al. Cascade-DETR: delving into high-quality universal object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2023: 6681-6691.
|
| 13 |
ROH B, SHIN J, SHIN W C, et al. Sparse DETR: efficient end-to-end object detection with learnable sparsity[EB/OL]. [2024-01-14]. https://arxiv.org/abs/2111.14330.
|
| 14 |
ZHENG D H, DONG W H, HU H L, et al. Less is more: focus attention for efficient DETR[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2023: 6651-6660.
|
| 15 |
王国明, 贾代旺. 基于YOLOv8的小目标检测模型的优化. 计算机工程, 2025, 51 (12): 294- 303.
doi: 10.19678/j.issn.1000-3428.0070027
|
|
WANG G M , JIA D W . Optimization of small object detection model based on YOLOv8. Computer Engineering, 2025, 51 (12): 294- 303.
doi: 10.19678/j.issn.1000-3428.0070027
|
| 16 |
董刚, 谢维成, 黄小龙, 等. 深度学习小目标检测算法综述. 计算机工程与应用, 2023, 59 (11): 16- 27.
|
|
DONG G , XIE W C , HUANG X L , et al. Review of small object detection algorithms based on deep learning. Computer Engineering and Applications, 2023, 59 (11): 16- 27.
|
| 17 |
|
| 18 |
WANG T, YUAN L, CHEN Y P, et al. PnP-DETR: towards efficient visual analysis with Transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 4641-4650.
|
| 19 |
|
| 20 |
TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2020: 1-9.
|
| 21 |
ZONG Z F, SONG G L, LIU Y. DETRs with collaborative hybrid assignments training[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2023: 6725-6735.
|
| 22 |
|
| 23 |
MENG D P, CHEN X K, FAN Z J, et al. Conditional DETR for fast training convergence[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 3631-3640.
|
| 24 |
|
| 25 |
WANG Y M, ZHANG X Y, YANG T, et al. Anchor DETR: query design for Transformer-based detector[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2022: 2567-2575.
|
| 26 |
|
| 27 |
LIU Y, ZHANG Y, WANG Y X, et al. SAP-DETR: bridging the gap between salient points and queries-based Transformer detector for fast model convergency[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 15539-15547.
|
| 28 |
LI F, ZHANG H, LIU S L, et al. DN-DETR: accelerate DETR training by introducing query DeNoising[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 13609-13617.
|
| 29 |
ZHANG H, LI F, LIU S, et al. DINO: DETR with improved denoising anchor boxes for end-to-end object detection[EB/OL]. [2024-01-14]. https://arxiv.org/abs/2203.03605.
|
| 30 |
CHEN Q, CHEN X K, WANG J, et al. Group DETR: fast DETR training with group-wise one-to-many assignment[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2023: 6610-6619.
|
| 31 |
JIA D, YUAN Y H, HE H D, et al. DETRs with hybrid matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 19702-19712.
|
| 32 |
潘晓英, 贾凝心, 穆元震, 等. 小目标检测研究综述. 中国图象图形学报, 2023, 28 (9): 2587- 2615.
|
|
PAN X Y , JIA N X , MU Y Z , et al. Survey of small object detection. Journal of Image and Graphics, 2023, 28 (9): 2587- 2615.
|
| 33 |
王福军, 王星, 王柯迪. 基于双域查询增强Transformer的遥感图像旋转小目标检测. 吉林大学学报(理学版), 2025, 63 (5): 1418- 1426.
|
|
WANG F J , WANG X , WANG K D . Rotated small object detection of remote sensing images based on dual-domain query enhanced Transformer. Journal of Jilin University (Science Edition), 2025, 63 (5): 1418- 1426.
|
| 34 |
LI F, ZENG A L, LIU S L, et al. Lite DETR: an interleaved multi-scale encoder for efficient DETR[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 18558-18567.
|
| 35 |
|
| 36 |
ZHANG G, LUO Z, CUI K, et al. Meta-DETR: image-level few-shot object detection with inter-class correlation exploitation[EB/OL]. [2024-01-14]. https://arxiv.org/abs/2103.11731.
|
| 37 |
BULAT A, GUERRERO R, MARTINEZ B, et al. FS-DETR: few-shot detection Transformer with prompting and without re-training[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2023: 11759-11768.
|
| 38 |
RADFORD A, KIM J, HALLACY C, et al. Learning Transferable visual models from natural language supervision[EB/OL]. [2024-01-14]. https://arxiv.org/abs/2103.00020.
|
| 39 |
|
| 40 |
DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional Transformers for language understanding[EB/OL]. [2024-01-14]. https://arxiv.org/abs/1810.04805.
|
| 41 |
DAI Z G , CAI B L , LIN Y G , et al. Unsupervised pre-training for detection Transformers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45 (11): 12772- 12782.
|
| 42 |
CARON M, MISRA I, MAIRAL J, et al. Unsupervised learning of visual features by contrasting cluster assignments[EB/OL]. [2024-01-14]. https://arxiv.org/abs/2006.09882.
|
| 43 |
CHEN Z R, HUANG G S, LI W, et al. Siamese DETR[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 15722-15731.
|
| 44 |
LIU S L, HUANG S J, LI F, et al. DQ-DETR: dual query detection Transformer for phrase extraction and grounding[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2023: 1728-1736.
|
| 45 |
KAMATH A, SINGH M, LECUN Y, et al. MDETR—modulated detection for end-to-end multi-modal understanding[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 1760-1770.
|
| 46 |
SHI F Y , GAO R P , HUANG W L , et al. Dynamic MDETR: a dynamic multimodal Transformer decoder for visual grounding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46 (2): 1181- 1198.
doi: 10.1109/TPAMI.2023.3328185
|
| 47 |
ZANG Y H, LI W, ZHOU K Y, et al. Open-vocabulary DETR with conditional matching[C]//Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer Nature Switzerland, 2022: 106-122.
|
| 48 |
WANG J, SUN A, ZHANG H, et al. MS-DETR: natural language video localization with sampling moment-moment interaction[EB/OL].[2024-01-14]. https://arxiv.org/abs/2305.18969.
|
| 49 |
周丽娟, 毛嘉宁. 视觉Transformer识别任务研究综述. 中国图象图形学报, 2023, 28 (10): 2969- 3003.
|
|
ZHOU L J , MAO J N . Vision Transformer-based recognition tasks: a critical review. Journal of Image and Graphics, 2023, 28 (10): 2969- 3003.
|
| 50 |
王杨, 宋世佳, 王鹤琴, 等. 基于改进Vision Transformer的局部光照一致性估计. 计算机工程, 2025, 51 (2): 312- 321.
doi: 10.19678/j.issn.1000-3428.0068905
|
|
WANG Y , SONG S J , WANG H Q , et al. Estimation of local illumination consistency based on improved Vision Transformer. Computer Engineering, 2025, 51 (2): 312- 321.
doi: 10.19678/j.issn.1000-3428.0068905
|