作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (2): 215-223. doi: 10.19678/j.issn.1000-3428.0060268

• 图形图像处理 • 上一篇    下一篇

基于多流网络一致性的视频显著性检测

宋佳, 陈程立诏   

  1. 青岛大学 计算机科学技术学院, 山东 青岛 266071
  • 收稿日期:2020-12-14 修回日期:2021-02-04 发布日期:2021-02-24
  • 作者简介:宋佳(1994-),女,硕士研究生,主研方向为视频显著性物体检测;陈程立诏,副教授、博士。
  • 基金资助:
    国家自然科学基金(61802215,61806106)。

Video Saliency Detection Based on Multi-Stream Network Consistency

SONG Jia, CHEN Chenglizhao   

  1. School of Computer Science and Technology, Qingdao University, Qingdao, Shangdong 266071, China
  • Received:2020-12-14 Revised:2021-02-04 Published:2021-02-24

摘要: 现有的视频显著性检测算法通常采用双流结构提取视频的时空线索,其中运动信息作为双流结构的一个分支,在显著物体发生剧烈或慢速移动时存在运动估计准确率低的问题,并且不合理的训练数据或方案使得权重偏向单个分支结构。提出一种基于多流网络一致性的视频显著性检测算法MSNC。设计并使用一种新的三重网络结构提取预选目标区域的颜色信息、时序信息和先验特征,通过先验特征补偿运动流的缺陷,并提高运动线索的利用率。采用多流一致性融合模型优化三流分支,得到不同特征的最佳融合方案。同时通过循环训练策略平衡三重网络的权重,以避免网络过度拟合单流分支,从而有效地提高运动估计和定位的准确率。在Davis数据集上的实验结果表明,相比PCSA、SSAV、MGA等算法,该算法的鲁棒性更优,其maxF和S-Measure值分别达到0.893和0.912,MAE仅为0.021。

关键词: 视频显著性检测, 运动信息, 先验信息, 多流一致性融合, 通道注意力机制

Abstract: Existing video saliency detection algorithms usually use the dual-stream structure to extract spatio-temporal clues.The motion information in the dual-stream structure leads to low accuracy of motion estimation when significant objects move violently or slowly.Unreasonable training data or schemes, however, bias the weight towards a single branch structure.To solve this problem, this paper proposes a video saliency detection algorithm MSNC based on a multi-stream network.A new triple network structure is designed to extract the color information, timing information, and priori features of the preselected target area, whereby the defects in motion flow are compensated through priori features while improving the utilization of motion cues.The multi-stream consistency fusion model is used to optimize the three stream branches to obtain the best fusion scheme with different characteristics.The weight of the triple network is balanced by the cyclic training strategy to avoid overfitting the single stream branch of the network to effectively improve the accuracy of motion estimation and location.The experimental results on the Davis dataset show that compared with PCSA, SSAV, and MGA, the robustness of the algorithm is better, with maxF and S-Measure values reaching 0.893 and 0.912, respectively, with an MAE of only 0.021.

Key words: video saliency detection, motion information, prior information, multi-stream consistency fusion, channel attention mechanism

中图分类号: