作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (6): 217-226. doi: 10.19678/j.issn.1000-3428.0064695

• 图形图像处理 • 上一篇    下一篇

基于联合注意与特征关联的实例分割算法

周逸云1,2, 万新军1,2, 胡伏原1,3, 陈昊1,2   

  1. 1. 苏州科技大学 电子与信息工程学院, 江苏 苏州 215009;
    2. 苏州市虚拟现实智能交互及应用技术重点实验室, 江苏 苏州 215009;
    3. 苏州市大数据与信息服务重点实验室, 江苏 苏州 215009
  • 收稿日期:2022-05-13 修回日期:2022-07-11 发布日期:2022-09-20
  • 作者简介:周逸云(1998-),女,硕士研究生,主研方向为计算机视觉、图像实例分割;万新军,硕士研究生;胡伏原(通信作者),教授、博士;陈昊,硕士研究生。
  • 基金资助:
    国家自然科学基金(61876121)。

Instance Segmentation Algorithm Based on Joint Attention and Feature Association

ZHOU Yiyun1,2, WAN Xinjun1,2, HU Fuyuan1,3, CHEN Hao1,2   

  1. 1. School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, Jiangsu, China;
    2. Virtual Reality Key Laboratory of Intelligent Interaction and Application Technology of Suzhou, Suzhou 215009, Jiangsu, China;
    3. Key Laboratory of Big Data and Information Services of Suzhou, Suzhou 215009, Jiangsu, China
  • Received:2022-05-13 Revised:2022-07-11 Published:2022-09-20

摘要: 针对现有实例分割算法因目标特征表示不充分、模型捕获信息不完整等因素导致分割精度较低的问题,提出一种基于联合注意和特征关联的实例分割算法。该算法采用联合注意力机制,沿通道和空间两个不同维度对感兴趣区域特征进行权重优化,聚焦关键对象位置,实现目标特征表示,抑制冗余信息对实例检测和分割结果干扰。在此基础上,在分割阶段建立特征关联关系,充分挖掘实例内部各像素点相似性,加强网络对实例部分的细节特征感知,实现高质量的掩膜预测。此外,通过引入协调损失函数监督检测中分类和回归任务产生一致预测,提高目标对象检测的准确性,进一步提升分割性能。在MS COCO 2017和Cityscapes两个数据集上进行实验验证,结果表明:该算法能够有效提高各现实场景下实例的检测和分割质量。当主干网络为ResNet-50/101时,该算法在COCO数据集上的掩膜平均精度分别达到37.5%和38.6%,较基线方法Mask R-CNN分别提高1.9和2.4个百分点;在Cityscapes验证集和测试集上,该算法较Mask R-CNN在主干网络为ResNet-50时分别提高2.4和2.5个百分点。

关键词: 计算机视觉, 实例分割, 联合注意, 特征关联, 掩膜预测

Abstract: Aiming to address the problem of low segmentation accuracy caused by insufficient target feature representation and incomplete model capture information in existing instance segmentation algorithms,an instance segmentation algorithm based on joint attention and feature association is proposed. The algorithm uses a joint attention mechanism to adjust the weight of the features of the Region of Interest(ROI) along the two different dimensions of channel and space,focuses on the location of key objects,realizes the target feature representation,and suppresses the interference of redundant information on instance detection and segmentation.In the segmentation stage,the similarity of each pixel in the instance is fully mined by establishing feature associations to enhance the network's perception of the details of the instance and achieves a high-quality mask prediction.In addition,by introducing a coordination loss function to supervise the classification and regression tasks in detection to generate consistent predictions,the accuracy of the target object detection is improved,and the segmentation performance is further improved.Extensive experiments are performed on two datasets,MS COCO 2017 and Cityscapes.Experimental results demonstrate that the proposed algorithm can effectively improve the detection and segmentation quality of instances in various real-world scenarios.When the backbone network is ResNet-50/101,the mask average accuracy of this algorithm on the COCO dataset reaches 37.5% and 38.6%,respectively,which are 1.9 and 2.4 percentage points higher than the baseline method Mask R-CNN;Evaluated on the Cityscapes validation set and test set,the algorithm is improved by 2.4 and 2.5 percentage points,respectively,compared with Mask R-CNN when the backbone network is ResNet-50.

Key words: computer vision, instance segmentation, joint attention, feature association, mask prediction

中图分类号: