作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (4): 177-186. doi: 10.19678/j.issn.1000-3428.0067733

• 图形图像处理 • 上一篇    下一篇

基于YOLO-Pose的城市街景小目标行人姿态估计算法

马明旭1, 马宏2,*(), 宋华伟1   

  1. 1. 郑州大学网络空间安全学院, 河南 郑州 450000
    2. 战略支援部队信息工程大学信息技术研究所, 河南 郑州 450000
  • 收稿日期:2023-05-30 出版日期:2024-04-15 发布日期:2024-04-22
  • 通讯作者: 马宏
  • 基金资助:
    河南省重大科技专项(221100210100)

Pose Estimation Algorithm for Small Target Pedestrians in Urban Street View Based on YOLO-Pose

Mingxu MA1, Hong MA2,*(), Huawei SONG1   

  1. 1. School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450000, Henan, China
    2. Information Technology Institute, Information Engineering University, Zhengzhou 450000, Henan, China
  • Received:2023-05-30 Online:2024-04-15 Published:2024-04-22
  • Contact: Hong MA

摘要:

现有的姿态估计算法在城市街景中对小目标行人的检测效果不佳。针对该问题, 提出一种基于YOLO-Pose的小目标行人姿态估计算法YOLO-Pose-CBAM。通过引入CBAM注意力机制模块, 在不增加过多计算量的前提下, 增强网络聚焦小目标行人区域的能力, 提升算法对小目标行人的敏感度, 同时在主干网络中使用4个不同尺寸的检测头, 丰富算法对图片中不同大小行人的检测手段; 在骨干网络和颈部之间架设2条跨层级联通道, 提升浅层网络与深层网络之间的特征融合能力, 进一步增强信息交流, 降低小目标行人漏检率; 引入SIoU重新定义边界框回归的定位损失函数, 加快训练的收敛速度, 提高检测精度; 采用k-means++算法代替k-means算法对数据集中标注的锚框进行聚类, 避免聚类中心初始化时导致的局部最优解问题, 从而选择出更适合检测小目标行人的锚框。对比实验结果表明, 在小目标行人WiderKeypoints数据集上, 所提算法相较于YOLO-Pose和YOLOv7-Pose在平均精度上分别提升了4.6和6.5个百分比。

关键词: YOLO-Pose算法, 姿态估计, 跨层级联, CBAM注意力机制, SIoU损失函数, k-means++算法

Abstract:

To address the problem that existing attitude estimation algorithms are not effective in detecting small target pedestrians in an urban streetscape, this study proposes a pose estimation algorithm for small target pedestrian, YOLO-Pose-CBAM, based on YOLO-Pose. First, the CBAM attention mechanism module is introduced to enhance the ability of the network to focus on small target pedestrian areas and improve the sensitivity of the algorithm to small target pedestrians on the premise of not increasing the computation excessively. Simultaneously, four detection heads of different sizes are used in the trunk network to enrich the detection means of the algorithm for pedestrians of different sizes. Second, two cross layer cascading channels are constructed between the Backbone and Neck, which improves the feature fusion ability between the shallow and deep networks, further enhancing the information exchange and reducing the missed rate of small target pedestrians. Furthermore, the SIoU is introduced to redefine the location loss function of the boundary box regression, which can accelerate the convergence speed of the training and improve the detection accuracy. Finally, the k-means++ algorithm is used instead of the k-means algorithm to cluster the tagged anchor frames in the dataset, avoiding the local optimal solution problem caused by the initialization of the clustering center to select the anchor frame that is more suitable for detecting small target pedestrians. Compared with the experimental results, the Average Precision(AP) of the proposed algorithm for the small target pedestrian WiderKeypoints dataset is improved by 4.6 percentage points compared with that of YOLO-Pose and by 6.5 percentage points compared with that of YOLOv7-Pose.

Key words: YOLO-Pose algorithm, pose estimation, cross layer cascading, CBAM attention mechanism, SIoU loss function, k-means++ algorithm