基于联合注意与特征关联的实例分割算法

doi:10.19678/j.issn.1000-3428.0064695

摘要/Abstract

摘要： 针对现有实例分割算法因目标特征表示不充分、模型捕获信息不完整等因素导致分割精度较低的问题，提出一种基于联合注意和特征关联的实例分割算法。该算法采用联合注意力机制，沿通道和空间两个不同维度对感兴趣区域特征进行权重优化，聚焦关键对象位置，实现目标特征表示，抑制冗余信息对实例检测和分割结果干扰。在此基础上，在分割阶段建立特征关联关系，充分挖掘实例内部各像素点相似性，加强网络对实例部分的细节特征感知，实现高质量的掩膜预测。此外，通过引入协调损失函数监督检测中分类和回归任务产生一致预测，提高目标对象检测的准确性，进一步提升分割性能。在MS COCO 2017和Cityscapes两个数据集上进行实验验证，结果表明：该算法能够有效提高各现实场景下实例的检测和分割质量。当主干网络为ResNet-50/101时，该算法在COCO数据集上的掩膜平均精度分别达到37.5%和38.6%，较基线方法Mask R-CNN分别提高1.9和2.4个百分点；在Cityscapes验证集和测试集上，该算法较Mask R-CNN在主干网络为ResNet-50时分别提高2.4和2.5个百分点。

关键词: 计算机视觉, 实例分割, 联合注意, 特征关联, 掩膜预测

Abstract: Aiming to address the problem of low segmentation accuracy caused by insufficient target feature representation and incomplete model capture information in existing instance segmentation algorithms，an instance segmentation algorithm based on joint attention and feature association is proposed. The algorithm uses a joint attention mechanism to adjust the weight of the features of the Region of Interest（ROI） along the two different dimensions of channel and space，focuses on the location of key objects，realizes the target feature representation，and suppresses the interference of redundant information on instance detection and segmentation.In the segmentation stage，the similarity of each pixel in the instance is fully mined by establishing feature associations to enhance the network's perception of the details of the instance and achieves a high-quality mask prediction.In addition，by introducing a coordination loss function to supervise the classification and regression tasks in detection to generate consistent predictions，the accuracy of the target object detection is improved，and the segmentation performance is further improved.Extensive experiments are performed on two datasets，MS COCO 2017 and Cityscapes.Experimental results demonstrate that the proposed algorithm can effectively improve the detection and segmentation quality of instances in various real-world scenarios.When the backbone network is ResNet-50/101，the mask average accuracy of this algorithm on the COCO dataset reaches 37.5% and 38.6%，respectively，which are 1.9 and 2.4 percentage points higher than the baseline method Mask R-CNN；Evaluated on the Cityscapes validation set and test set，the algorithm is improved by 2.4 and 2.5 percentage points，respectively，compared with Mask R-CNN when the backbone network is ResNet-50.

Key words: computer vision, instance segmentation, joint attention, feature association, mask prediction

中图分类号:

TP391

周逸云, 万新军, 胡伏原, 陈昊. 基于联合注意与特征关联的实例分割算法[J]. 计算机工程, 2023, 49(6): 217-226.

ZHOU Yiyun, WAN Xinjun, HU Fuyuan, CHEN Hao. Instance Segmentation Algorithm Based on Joint Attention and Feature Association[J]. Computer Engineering, 2023, 49(6): 217-226.

https://www.ecice06.com/CN/Y2023/V49/I6/217

图/表 15

20230615170146

20230615170149

20230615170153

20230615170156

20230615170202

20230615170206

20230615170209

20230615170212

20230615170215

20230615170219

20230615170222

20230615170226

20230615170229

20230615170233

20230615170237

参考文献

[1] 王文欣,贺煜航,陈刚.基于EM路由算法的医学图像分割UCaps网络[J].计算机工程,2022,48(2):268-274.WANG W X,HE Y H,CHEN G.UCaps network based on EM-routing algorithm for medical image segmentation[J].Computer Engineering,2022,48(2):268-274.(in Chinese)
[2] 穆世义,徐树公.基于单字符注意力的全品类鲁棒车牌识别[J].自动化学报,2023,49(1):122-134.MU S Y,XU S G.Full-category robust license plate recognition based on character attention[J].Acta Automatica Sinica,2023,49(1):122-134.(in Chinese)
[3] 周东明,张灿龙,唐艳平,等.联合语义分割与注意力机制的行人再识别模型[J].计算机工程,2022,48(2):201-206.ZHOU D M,ZHANG C L,TANG Y P,et al.Pedestrian re-identification model combining semantic segmentation and attention mechanism[J].Computer Engineering,2022,48(2):201-206.(in Chinese)
[4] 蒋弘毅,王永娟,康锦煜.目标检测模型及其优化方法综述[J].自动化学报,2021,47(6):1232-1255.JIANG H Y,WANG Y J,KANG J Y.A survey of object detection models and its optimization methods[J].Acta Automatica Sinica,2021,47(6):1232-1255.(in Chinese)
[5] JIANG Y,TAN Z,WANG J,et al.GiraffeDet:a heavy-neck paradigm for object detection[EB/OL].[2022-05-10].https://arxiv.org/abs/2202.04256.
[6] ZAIDI S S A,ANSARI M S,ASLAM A,et al.A survey of modern deep learning based object detection models[J].Digital Signal Processing,2022,126:103514.
[7] 李耀仟,李才子,刘瑞强,等.面向手术器械语义分割的半监督时空Transformer网络[J].软件学报,2022,33(4):1501-1515.LI Y Q,LI C Z,LIU R Q,et al.Semi-supervised spatiotemporal Transformer networks for semantic segmentation of surgical instrument[J].Journal of Software,2022,33(4):1501-1515.(in Chinese)
[8] STRUDEL R,GARCIA R,LAPTEV I,et al.Segmenter:Transformer for semantic segmentation[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2022:7242-7252.
[9] XIE E,WANG W,YU Z,et al.SegFormer:simple and efficient design for semantic segmentation with Transformers[EB/OL].[2022-05-10].https://arxiv.org/abs/2105.15203.
[10] BOLYA D,ZHOU C,XIAO F,et al.YOLACT:real-time instance segmentation[EB/OL].[2022-05-10].https://arxiv.org/abs/1904.02689.
[11] WANG X L,KONG T,SHEN C H,et al.SOLO:segmenting objects by locations[C]//Proceedings of ECCV'20.Berlin,Germany:Springer,2020:649-665.
[12] PENG S D,JIANG W,PI H J,et al.Deep snake for real-time instance segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2020:8530-8539.
[13] DUAN K W,BAI S,XIE L X,et al.CenterNet:keypoint triplets for object detection[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2020:6568-6577.
[14] CHEN H,SUN K Y,TIAN Z,et al.BlendMask:top-down meets bottom-up for instance segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2020:8570-8578.
[15] CHEN X L,GIRSHICK R,HE K M,et al.TensorMask:a foundation for dense object segmentation[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2020:2061-2069.
[16] TIAN Z,SHEN C H,WANG X L,et al.BoxInst:high-performance instance segmentation with box annotations[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2021:5439-5448.
[17] HE K M,GKIOXARI G,DOLLÁR P,et al.Mask R-CNN[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:2980-2988.
[18] REN S Q,HE K M,GIRSHICK R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[19] WANG K X,LIEW J H,ZOU Y T,et al.PANet:few-shot image semantic segmentation with prototype alignment[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2020:9196-9205.
[20] HUANG Z J,HUANG L C,GONG Y C,et al.Mask scoring R-CNN[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2020:6402-6411.
[21] SOFIIUK K,BARINOVA O,KONUSHIN A.AdaptIS:adaptive instance selection network[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2020:7354-7362.
[22] CHENG T H,WANG X G,HUANG L C,et al.Boundary-preserving Mask R-CNN[C]//Proceedings of ECCV'20.Berlin,Germany:Springer,2020:660-676.
[23] JIANG B Y,ZHANG J Y,HONG Y,et al.BCNet:learning body and cloth shape from a single image[C]//Proceedings of ECCV'20.Berlin,Germany:Springer,2020:18-35.
[24] KE L,DANELLJAN M,LI X,et al.Mask Transfiner for high-quality instance segmentation[EB/OL].[2022-05-10].https://arxiv.org/abs/2111.13673.
[25] NEVEN D,DE BRABANDERE B,PROESMANS M,et al.Instance segmentation by jointly optimizing spatial embeddings and clustering bandwidth[EB/OL].[2022-05-10].https://arxiv.org/abs/1906.11109.
[26] DING H,QIAO S Y,YUILLE A,et al.Deeply shape-guided cascade for instance segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2021:8274-8284.
[27] YUAN X D,KORTYLEWSKI A,SUN Y H,et al.Robust instance segmentation through reasoning about multi-object occlusion[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2021:11136-11145.
[28] TAN B,XUE N,BAI S,et al.PlaneTR:structure-guided Transformers for 3D plane recovery[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2022:4166-4175.
[29] GAO N Y,SHAN Y H,WANG Y P,et al.SSAP:single-shot instance segmentation with affinity pyramid[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2020:642-651.
[30] HUANG Z L,WANG X G,HUANG L C,et al.CCNet:criss-cross attention for semantic segmentation[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2020:603-612.
[31] ZHANG T Y,ZHANG X R,ZHU P,et al.Semantic attention and scale complementary network for instance segmentation in remote sensing images[J].IEEE Transactions on Cybernetics,2022,52(10):10999-11013.
[32] ZHANG H W,ZHANG D,GAO Z F,et al.Joint segmentation and quantification of main coronary vessels using dual-branch multi-scale attention network[C]//Proceedings of MICCAI'21.Berlin,Germany:Springer,2021:369-378.
[33] HU M,LI Y L,FANG L,et al.A2-FPN:attention aggregation based feature pyramid network for instance segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2021:15338-15347.
[34] LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramid networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:936-944.
[35] WANG K Y,ZHANG L.Reconcile prediction consistency for balanced object detection[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2022:3611-3620.
[36] LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:common objects in context[C]//Proceedings of ECCV'14.Berlin,Germany:Springer,2014:740-755.

选择文件类型/文献管理软件名称

选择包含的内容