作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (7): 227-233,240. doi: 10.19678/j.issn.1000-3428.0061811

• 图形图像处理 • 上一篇    下一篇

基于多模态多级特征聚合网络的光场显著性目标检测

王安志, 任春洪, 何淋艳, 杨元英, 欧卫华   

  1. 贵州师范大学 大数据与计算机科学学院, 贵阳 550025
  • 收稿日期:2021-06-02 修回日期:2021-08-11 出版日期:2022-07-15 发布日期:2022-07-12
  • 作者简介:王安志(1986—),男,副教授、博士,主研方向为计算机视觉、深度学习;任春洪、何淋艳、杨元英,本科生;欧卫华,教授、博士。
  • 基金资助:
    国家自然科学基金(62162013,61962010);贵州省自然科学基金([2017]1130,[2017]5726-32);贵州师范大学2019年博士科研启动项目(GZNUD[2018]32号);贵州省大学生创新创业训练计划项目(S202010663031);贵州师范大学大学生科研训练计划项目(DK2019A059)。

Light Field Salient Object Detection Based on Multi-modal Multi-level Feature Aggregation Network

WANG Anzhi, REN Chunhong, HE Linyan, YANG Yuanying, QU Weihua   

  1. School of Big Data and Computer Science, Guizhou Normal University, Guiyang 550025, China
  • Received:2021-06-02 Revised:2021-08-11 Online:2022-07-15 Published:2022-07-12

摘要: 现有基于深度学习的显著性检测算法主要针对二维RGB图像设计,未能利用场景图像的三维视觉信息,而当前光场显著性检测方法则多数基于手工设计,特征表示能力不足,导致上述方法在各种挑战性自然场景图像上的检测效果不理想。提出一种基于卷积神经网络的多模态多级特征精炼与融合网络算法,利用光场图像丰富的视觉信息,实现面向四维光场图像的精准显著性检测。为充分挖掘三维视觉信息,设计2个并行的子网络分别处理全聚焦图像和深度图像。在此基础上,构建跨模态特征聚合模块实现对全聚焦图像、焦堆栈序列和深度图3个模态的跨模态多级视觉特征聚合,以更有效地突出场景中的显著性目标对象。在DUTLF-FS和HFUT-Lytro光场基准数据集上进行实验对比,结果表明,该算法在5个权威评估度量指标上均优于MOLF、AFNet、DMRA等主流显著性目标检测算法。

关键词: 深度图, 特征融合, 光场, 聚合网络, 显著性目标检测

Abstract: Most existing deep learning based saliency detection algorithms focus on 2D RGB images. However, they fail to take advantage of 3D visual information of scenes.Most light field saliency detection methods are based on hand-crafted features, whose feature representation capacity is insufficient.These issues lead to poor performance in many challenging scene images.To remedy these problems, this paper proposes a multi-modal multi-level feature aggregation network based on convolutional neural network for light field salient object detection.To fully exploit 3D visual information, two stream sub-network are designed in parallel to handle all-focus images and depth maps separately.Moreover, several feature aggregation modules are developed to aggregate multi-level features to detect the salient objects in scene.Moreover, several cross-modal feature fusion modules are designed to fuse multi-modal features from all-focus images, focal stack, and depth maps, which can highlight a salient object by utilizing 3D visual information.Comprehensive experimental comparisons were performed on the DUTLF-FS and HFUT-Lytro light field benchmark datasets, and the results reveal that the algorithm outperforms the mainstream salient target detection algorithms, such as MOLF, AFNet, and DMRA on five authoritative evaluation metrics.

Key words: depth map, feature fusion, light field, aggregation network, Salient Object Detection(SOD)

中图分类号: