作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (12): 186-193. doi: 10.19678/j.issn.1000-3428.0066192

• 图形图像处理 • 上一篇    下一篇

多尺度视觉感知融合的显著性目标检测

刘仲任, 彭力   

  1. 江南大学 物联网技术应用教育部工程研究中心, 江苏 无锡 214000
  • 收稿日期:2022-11-06 出版日期:2023-12-15 发布日期:2023-12-14
  • 作者简介:

    刘仲任(1997—),男,硕士,主研方向为模式识别、深度学习

    彭力,教授、博士

  • 基金资助:
    国家自然科学基金(61873112); 国家自然科学基金(61802107)

Salient Object Detection with Multi-Scale Visual Perception and Fusion

Zhongren LIU, Li PENG   

  1. Engineering Research Center of Internet of Things Technology Applications of Ministry of Education, Jiangnan University, Wuxi 214000, Jiangsu, China
  • Received:2022-11-06 Online:2023-12-15 Published:2023-12-14

摘要:

显著性目标检测算法大多存在单一特征检测缺陷和多特征融合不充分等问题,从而导致显著图边缘不清晰以及背景抑制效果较差。为此,提出一种多尺度视觉感知融合的显著性目标检测方法,该方法包含多尺度视觉感知模块(MVPM)和多尺度特征融合模块(MFFM),分别用于处理显著性目标的全局信息和融合多尺度特征。基于U型网络结构,利用空洞卷积模拟视觉皮层中的感受野以构建MVPM,充分发挥空洞卷积在卷积神经网络中的作用,在主干网络中逐级提取显著性目标的全局空间信息,有效增强前景显著性区域,抑制背景噪声区域。设计MFFM,利用特征金字塔和空间注意力机制将高级语义信息与细节信息相融合,在抑制噪声传递的同时有效恢复显著性目标的空间结构信息。在ECSSD、DUTS、SOD等5个具有复杂背景信息的图像数据集上进行实验,结果表明,该方法的平均F-Measure值达到88.4%,比基准网络U-Net提高14.2个百分点,MAE值达到3.5%,比基准网络降低5.4个百分点。

关键词: 卷积神经网络, 显著性目标检测, 多尺度视觉感知, 多尺度特征融合, 感受野

Abstract:

Most salient object detection algorithms have problems such as single-feature detection defects and insufficient fusion of multiple features, resulting in unclear edges of saliency images and poor background suppression effects. To address these problems, a salient object detection method with multi-scale visual perception and fusion is proposed, which includes a Multiscale Visual Perception Module(MVPM) and a Multi-scale Feature Fusion Module(MFFM), for processing global information of saliency objects and fusing multi-scale features. Based on the U-shaped network structure, void convolution is used to simulate the receptive field in the visual cortex to construct MVPM, fully leveraging the role of void convolution in Convolutional Neural Network(CNN). Global spatial information of salient objects in the backbone network is extracted step by step, thus enhancing foreground saliency regions and suppressing background noise regions. The MFFM is designed by utilizing feature pyramids and spatial attention mechanisms to fuse advanced semantic information with detailed information, thereby restoring spatial structure information of saliency objects while suppressing noise transmission. Experiments conducted on five image datasets with complex background information, including the ECSSD, DUTS, and SOD, showed that the average F-Measure value of this method reached 88.4%, which is 14.2 percentage points higher than the benchmark network U-Net, and the Mean Absolute Error(MAE) value reached 3.5%, which is 5.4 percentage points lower than the benchmark network.

Key words: Convolutional Neural Network(CNN), salient object detection, multi-scale visual perception, multi-scale feature fusion, receptive fields