作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (12): 178-185. doi: 10.19678/j.issn.1000-3428.0066683

• 图形图像处理 • 上一篇    下一篇

基于区域感知的多尺度目标检测算法

黄路1, 李泽平1, 杨文帮1, 赵勇2, 张嫡1   

  1. 1. 贵州大学 计算机科学与技术学院 公共大数据国家重点实验室, 贵阳 550025
    2. 北京大学深圳研究生院 信息工程学院, 广东 深圳 518055
  • 收稿日期:2023-01-04 出版日期:2023-12-15 发布日期:2023-03-22
  • 作者简介:

    黄路(1997—),男,硕士研究生,主研方向为目标检测、计算机视觉

    李泽平,教授、博士

    杨文帮,博士研究生

    赵勇,副教授、博士

    张嫡,硕士研究生

  • 基金资助:
    国家自然科学基金(61462014); 贵州省青年科技人才成长项目(黔教合KY字[2018]411)

Multi-Scale Object Detection Algorithm Based on Regional Perception

Lu HUANG1, Zeping LI1, Wenbang YANG1, Yong ZHAO2, Di ZHANG1   

  1. 1. College of Computer Science and Technology, State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China
    2. School of Information Engineering, Peking University Shenzhen Graduate School, Shenzhen 518055, Guangdong, China
  • Received:2023-01-04 Online:2023-12-15 Published:2023-03-22

摘要:

针对目标检测网络主分支层的特征信息易丢失、不同尺度的特征表达能力不平衡等问题,提出一种基于区域感知的多尺度目标检测算法。在YOLOv5的基础上采用数据增强、改进的边框损失和非极大值抑制方法,构建1个更强健的基线模型,沿着通道方向使用全局最大池化、全局平均池化、卷积等操作设计通道信息增强模块,并分别作用于骨干网络的每个主分支层,使得各个检测头在特征融合过程中也不会丢失主分支层的关键特征,以强化模型对重点区域的感知能力。利用加权特征融合方法融合不同尺度的特征信息,平衡不同尺度的输入特征对输出特征的表达能力,进而提高模型对多尺度目标的感知能力,通过调整模型的通道和深度,设计4种不同规模的网络结构。实验结果表明,相比YOLOv5s,该算法在Pascal VOC、MS COCO、Global Wheat、Wider Face、Motor Defect 5个数据集上的平均精度均值分别提高5.48、3.00、1.94、0.70和1.95个百分点。同时,该算法的平均精度均值最高为50.7%,分别比YOLOv4和Dynamic Head的最大模型提高7.2和3.0个百分点。

关键词: 目标检测, 增强基线模型, 通道信息增强, 加权特征融合, 多尺度目标

Abstract:

A multi-scale object detection algorithm based on regional perception is proposed to address the feature information loss in the main branch layer and imbalanced feature expression capabilities at different scales of the object detection network. A robust baseline model based on YOLOv5 is constructed using data augmentation, improved border loss, and Non-Maximum Suppression(NMS) methods. Channel Information Enhancement Modules(CIEM) are designed along the channel direction using operations such as Global Maximum Pooling(GMP), Global Average Pooling(GAP), and convolution. These modules are applied to each main branch layer of the backbone network. This ensures that each detection head does not forget the key features of the main branch layer during the feature fusion process, thereby enhancing the model's perception of the key areas. A Weighted Feature Fusion Method(WFFM) is used to fuse feature information from different scales, which balances the expression ability of input features to output features and improves the model's perception of multi-scale objects. Further, by adjusting the channel and depth of the model, four different scale network structures are designed. Experimental results on five datasets—Pascal VOC, MS COCO, Global Wheat, Wider Face, Motor Defect—show that, compared with YOLOv5s, the proposed algorithm increases the detection accuracy by 5.48, 3.00, 1.94, 0.70, and 1.95 percentage points, respectively. Moreover, the proposed algorithm has an average accuracy of up to 50.7%, which is 7.2 and 3.0 percentage points higher than the maximum models of YOLOv4 and Dynamic Head, respectively.

Key words: object detection, enhancement baseline model, channel information enhancement, weighted feature fusion, multi-scale object