作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (3): 211-219. doi: 10.19678/j.issn.1000-3428.0060518

• 图形图像处理 • 上一篇    下一篇

基于改进YOLOv3的红外目标检测方法

秦鹏1,2,3, 唐川明1,2,3, 刘云峰1,2, 张建林1,2, 徐智勇1,2   

  1. 1. 中国科学院光束控制重点实验室, 成都 610209;
    2. 中国科学院光电技术研究所, 成都 610209;
    3. 中国科学院大学 电子电气与通信工程学院, 北京 100049
  • 收稿日期:2021-01-07 修回日期:2021-03-09 发布日期:2021-03-24
  • 作者简介:秦鹏(1996-),男,硕士研究生,主研方向为深度学习、红外目标检测;唐川明,硕士研究生;刘云峰(通信作者),副研究员、博士;张建林,研究员、博士;徐智勇,研究员。
  • 基金资助:
    国家科技委创新项目(G158207)。

Infrared Target Detection Method Based on Improved YOLOv3

QIN Peng1,2,3, TANG Chuanming1,2,3, LIU Yunfeng1,2, ZHANG Jianlin1,2, XU Zhiyong1,2   

  1. 1. Key Laboratory of Beam Control, Chinese Academy of Sciences, Chengdu 610209, China;
    2. Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, China;
    3. School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2021-01-07 Revised:2021-03-09 Published:2021-03-24

摘要: 针对红外场景中行人、车辆等目标识别率低且存在复杂背景干扰的问题,提出一种基于Effi-YOLOv3模型的红外目标检测方法。将轻量高效的EfficientNet骨干网络与YOLOv3网络相结合,提升网络模型的运行速度。通过模拟人类视觉的感受野机制,引入改进的感受野模块,在几乎不增加计算量的情况下大幅增强网络有效感受野。基于可变形卷积和动态激活函数构建DBD和CBD结构,提升模型特征编码的灵活性,扩大模型容量。选择兼顾预测框与真值框中心点距离、重叠率和长宽比偏差的CIoU作为损失函数,更好地反映预测框与真值框的重叠程度,加快预测框回归速度。实验结果表明,该方法在FLIR数据集上的平均精度均值达到70.8%,Effi-YOLOv3模型参数量仅为YOLOv3模型的33.3%,对于红外场景中的目标检测效果更优。

关键词: YOLOv3模型, 红外目标检测, 复杂背景, 可变形卷积, 动态激活函数

Abstract: To improve the low recognition rate of persons and cars in infrared scenes and solve the problem of complex background interference, an infrared target detection method based on Effi-YOLOv3 is proposed.This method combines the lightweight and efficient EfficientNet backbone network and the YOLOv3 network to improve the training speed of the model.By simulating the receptive field mechanism of human vision, an improved Receptive Field Block(RFB) is introduced to significantly increase the effective receptive field of the network while increasing computation by a small amount.Then, based on deformable convolution and dynamic activation functions, DBD and CBD structures are constructed to improve the flexibility of model feature coding and increase network model capacity.Finally, the CIoU, which takes into account the distance between the center points of the prediction box and the ground truth box and the overlap ratio and deviation of aspect ratio between them, is selected as the loss function.This better reflects the degree of overlap between the prediction box and the ground truth box and accelerates the regression speed of the prediction box.The experimental results show that the mean Average Precision(mAP) of the proposed method on the FLIR dataset reaches 70.8%, the parameter quantity of the Effi-YOLOv3 model is only 33.3% of the YOLOv3 model, and the detection effect of infrared targets is significantly improved.

Key words: YOLOv3 model, infrared target detection, complex background, deformable convolution, dynamic activation function

中图分类号: