作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (5): 255-261,268. doi: 10.19678/j.issn.1000-3428.0064616

• 图形图像处理 • 上一篇    下一篇

基于深度质量感知和分层特征引导的RGB-D显著性检测

宋梦柯, 郑元超, 陈程立诏   

  1. 青岛大学 计算机科学技术学院, 山东 青岛 266071
  • 收稿日期:2022-05-05 修回日期:2022-06-20 发布日期:2022-08-15
  • 作者简介:宋梦柯(1997-),男,硕士研究生,主研方向为计算机视觉、RGB-D显著性检测;郑元超,硕士研究生;陈程立诏,教授、博士。
  • 基金资助:
    山东省自然科学基金博士项目(ZR2019BF011)。

RGB-D Saliency Detection via Depth Quality Perception and Hierarchical Feature Guidance

SONG Mengke, ZHENG Yuanchao, CHEN Chenglizhao   

  1. College of Computer Science and Technology, Qingdao University, Qingdao 266071, Shandong, China
  • Received:2022-05-05 Revised:2022-06-20 Published:2022-08-15

摘要: 现有基于融合的RGB-D显著性物体检测方法在对跨模态特征进行融合时忽视了RGB和深度图两模态特征的差异性,跨模态特征融合不均衡的问题使得模型不能充分利用跨模态互补特征,而低质量深度图也会对模型性能带来损害。提出一种基于深度质量感知和分层特征引导的RGB-D显著性物体检测算法。算法分为两个阶段:深度质量感知阶段和分层特征引导阶段。在第一阶段,利用深度质量感知从现有的主流RGB-D显著性物体检测训练数据集中挖掘高质量深度图,对训练集进行增强,提升低质量深度图的质量,减少噪声数据对模型性能的损害;在第二阶段,利用特征引导网络对RGB图和深度图进行分层自适应权重动态融合,在有效增加融合效率的同时增强跨模态融合的感知能力。在基准数据集NJUD、NLPR、SSD、STEREO和SIP上的实验结果表明,相比于SSF、CDNet、D3Net、DASNet等方法,该算法能够大幅提升深度图质量,其中在NLPR数据集上F-Measure值为0.934,MAE仅为0.020,综合性能优于其他相关SOTA方法,证明了先挖掘高质量深度图再进行跨模态自适应动态融合算法的有效性。

关键词: 深度质量感知, 特征引导, 跨模态融合, 分层融合, RGB-D显著性检测

Abstract: Existing fusion-based RGB-D saliency object detection methods ignore the differences between RGB and depth map features when fusing cross-modal features.The problems from fusing unbalanced cross-modal features makes the model insufficiently leverage cross-modal complementary features.Moreover,low-quality depth maps can hurt model performance.This paper proposes an RGB-D salient object detection algorithm based on depth quality perception and hierarchical feature guidance.The algorithm is divided into two stages:depth quality perception stage and hierarchical feature guidance stage.In the first stage,depth quality perception is used to mine high-quality depth maps from the existing mainstream RGB-D salient object detection training data sets to enhance the training sets.This process significantly improves the quality of low-quality depth maps and reduces the damage of noise data on model performance.In the second stage,the feature-guidance network is used to perform hierarchical adaptive weight dynamic fusion of the RGB and depth map,which effectively increases the fusion efficiency and enhances the cross-modality fusion perception.The experimental results on five benchmark datasets(NJUD,NLPR,SSD,STEREO,and SIP) show that the proposed algorithm significantly improves the depth map quality compared to methods such as SSF,CDNet,D3Net,and DASNet.Moreover,on the NLPR dataset,the F-Measure value is 0.934,whereas the MAE is only 0.020.The comprehensive performance is better than other related SOTA methods,proving the effectiveness of the proposed algorithm in first mining high-quality depth maps and then performing cross-modal adaptive dynamic fusion.

Key words: depth quality perception, feature guidance, cross-modal fusion, hierarchical fusion, RGB-D saliency detection

中图分类号: