计算机工程 ›› 2020, Vol. 46 ›› Issue (10): 275-281,288.doi: 10.19678/j.issn.1000-3428.0056292

• 图形图像处理 • 上一篇    下一篇

基于双目图像与跨级特征引导的语义分割模型

张娣, 陆建峰   

  1. 南京理工大学 计算机科学与工程学院, 南京 210094
  • 收稿日期:2019-10-14 修回日期:2019-11-23 发布日期:2019-12-02
  • 作者简介:张娣(1994-),女,硕士研究生,主研方向为双目视觉、语义分割;陆建峰,教授。
  • 基金项目:
    国家重点研发计划(2017YFB1300205)。

Semantic Segmentation Model Based on Binocular Images and Guidance of Cross-Level Features

ZHANG Di, LU Jianfeng   

  1. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
  • Received:2019-10-14 Revised:2019-11-23 Published:2019-12-02

摘要: 为改善单目图像语义分割网络对图像深度变化区域的分割效果,提出一种结合双目图像的深度信息和跨层次特征进行互补应用的语义分割模型。在不改变已有单目孪生网络结构的前提下,利用该模型分别提取双目左、右输入图像的二维信息,并基于ParallelNet设计色彩深度融合模块,计算双目图像特征点的不同视差等级相似度提取深度信息,同时将其与二维信息进行融合获得深度特征。同时,在高层语义信息指导下使用跨级特征注意力模块得到准确的低层类别边界信息,以提高各尺度特征的利用率与边缘区域的准确率。实验结果表明,与传统ParallelNet双目基准模型相比,该模型分割得到图像的平均交并比与像素精度分别提高3.67和3.32个百分点,对栅栏和交通标志等相似区域的分割更细致准确。

关键词: 语义分割, 双目图像, 深度信息, 跨级特征, 注意力

Abstract: In order to improve the segmentation effect of semantic segmentation networks for monocular images on regions where image depth vary.To address the problem,this paper proposes a semantic segmentation model combining the depth information of binocular images and cross level features for complementary application.With no changes to its structure,the existing monocular twin network is used to extract two-dimensional information of input left and right binocular images,and to design color depth fusion module based on ParallelNet.On this basis,the similarity of different parallax levels of binocular image feature points is calculated to extract depth information,which is fused with the two-dimensional information to obtain depth features.At the same time,the cross-level feature attention module is used to get the accurate information of low-level category boundary under the guidance of high-level semantic information,so as to improve the utilization rate of each scale of features and the accuracy of edge regions.Experimental results show that compared with the traditional ParallelNet binocular benchmark model,the proposed model increases the mean Intersection over Union(mIoU) and the Pixel Accuracy(PA) by 3.67 and 3.32 percentage points respectively,and the segmentation of similar regions such as fences and traffic signs is more detailed and accurate.

Key words: semantic segmentation, binocular image, depth information, cross-level feature, attention

中图分类号: