[1] HUANG S, SIREJIDING S, LU Y, et al. YOLO-Med: multi-task interaction network for biomedical images[C]//ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).IEEE,2024:2175–2179.
[2] LIU W, QIAO X, ZHAO C, et al. 2025. VP-YOLO: A human visual perception-inspired robust vehicle-pedestrian detection model for complex traffic scenarios. Expert Systems with Applications [J]: 126837.
[3] 刘子豪, 张佳欣, 薛峰, et al. 2025. 基于改进YOLO-v8的精密管件表面缺陷检测方法. 浙江大学学报(工学版) [J], 59: 1514–1522+1546.
LIU Z H, ZHANG J X, XUE F, et al. 2025. Surface defect detection method of precision pipe fittings based on improved yolo-v8. Journal of Zhejiang University(Engineering Science) [J], 59: 1514–1522+1546.
[4] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C] //Proceedings of the IEEE conference on computer vision and pattern recognition.2014:580–587.
[5] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C] //Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779–788.
[6] LIU W, ANGUELOV D, ERHAN D, et al. Ssd: Single shot multibox detector[C] //European conference on computer vision.Springer, 2016: 21–37.
[7] ULTRALYTICS. YOLO11[EB/OL]. [2026-03-24]. https://github.com/ultralytics/ultralytics.
[8] ULTRALYTICS. Ultralytics YOLO文档[EB/OL]. [2026-03-24]. https://docs.ultralytics.com.
[9] HOWARD A G, ZHU M, CHEN B, et al. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 [J].
[10] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018:7132–7141.
[11] Hou Q, Zhou D, Feng J. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 13713-13722.
[12] LIU S, QI L, QIN H, et al. Path aggregation network for -instance segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2018:8759–8768.
[13] ZHENG Z, WANG P, REN D, et al. 2021. Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE transactions on cybernetics [J], 52: 8574–8586.
[14] Li X, Wang W, Wu L, et al. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection[J]. Advances in neural information processing systems, 2020, 33: 21002-21012.
[15] 吴迪, 赵品懿, 甘升隆, et al. 2025. 基于动态自适应通道注意力特征融合的小目标检测. 电子科技大学学报 [J], 54: 221–232.
WU D, ZHAO P Y, GAN S L, et al. 2025. Small object detection based on dynamic adaptive channel attention feature fusion. Journal of University of Electronic Science and Technology of China [J], 54: 221–232.
[16] XIAO T, LIU Y, HUANG Y, et al. 2023. Enhancing multiscale representations with transformer for remote sensing image semantic segmentation. IEEE Transactions on Geoscience and Remote Sensing [J], 61: 1–16.
[17] PENG Y, LI H, WU P, et al. 2024. D-FINE: Redefine regression task in DETRs as fine-grained distribution refinement. arXiv preprint arXiv: 2410.13842 [J].
[18] Li H. Cdla: A chinese document layout analysis (cdla) dataset[EB/OL].(2021)
[19] DA C, LUO C, ZHENG Q, et al. Vision grid transformer for document layout analysis[C]// Proceedings of the IEEE/CVF international conference on computer vision.2023:19462–19472.
[20] ZHONG X, TANG J, YEPES A J. Publaynet: largest dataset ever for document layout analysis[C]//2019 International conference on document analysis and recognition (ICDAR).IEEE,2019:1015–1022.
[21] EVERINGHAM M, VAN GOOL L, WILLIAMS C K, et al. 2010. The pascal visual object classes (voc) challenge. International journal of computer vision [J], 88: 303–338.
[22] TAN M, LE Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C] //International conference on machine learning. PMLR, 2019: 6105–6114.
[23] REN S, HE K, GIRSHICK R, et al. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems [J], 28.
[24] ULTRALYTICS. YOLOv8[EB/OL]. [2026-03-24]. https://docs.ultralytics.com/models/yolov8.
[25] Wang A, Chen H, Liu L, et al. Yolov10: Real-time end-to-end object detection[J]. Advances in neural information processing systems, 2024, 37: 107984-108011.
[26] GE Z, LIU S, WANG F, et al. 2021. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 [J].
[27] ULTRALYTICS. YOLO26[EB/OL]. [2026-03-24]. https://docs.ultralytics.com/models/yolo26.
[28] Zhao Y, Lv W, Xu S, et al. Detrs beat yolos on real-time object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024: 16965-16974.
[29] REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: A metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.2019:658–666.
[30] ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: Faster and better learning for bounding box regression[C]//Proceedings of the AAAI conference on artificial intelligence.2020:12993–13000.
[31] Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3-19.
[32] Cai X, Lai Q, Wang Y, et al. Poly kernel inception network for remote sensing detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024: 27706-27716.
[33] Tan M, Pang R, Le Q V. Efficientdet: Scalable and efficient object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 10781-10790.
[34] Zhang Y, Zhou S, Li H. Depth information assisted collaborative mutual promotion network for single image dehazing[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024: 2846-2855.
[35] QIN D, LEICHNER C, DELAKIS M, et al. MobileNetV4: Universal models for the mobile ecosystem[C]//European Conference on Computer Vision.Springer,2024:78–96.
[36] JACOB B, KLIGYS S, CHEN B, et al. Quantizat ion and training of neural networks for efficient integer-arithmetic-only inference[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2018:2704–2713.
[37] Li H, Kadav A, Durdanovic I, et al. Pruning filters for efficient convnets[J]. arXiv preprint arXiv:1608.08710, 2016.
|