基于类别不平衡数据集的图像实例分割方法

doi:10.19678/j.issn.1000-3428.0063741

摘要/Abstract

摘要： 随着深度学习在计算机视觉领域取得重大进展，包含多种类别的数据集不断被提出，但由于自然采集的数据集往往存在类别不平衡并呈现长尾分布的情况，导致稀有类的数据特征被频繁类的数据特征所抑制，从而严重影响模型的检测性能。为解决上述问题，提出一种新的图像实例分割方法。采用长尾实例分割数据集进行研究实验，使用基于目标尺度的数据增广方法对数据集进行处理，以达到扩充训练样本的目的，增加稀有类的目标数量，同时对稀有类数据进行重采样，解决稀有类的类别数据量过小的问题，提升模型在长尾数据集的鲁棒性。在此基础上，将均等化损失函数融入Mask R-CNN实例分割网络，以降低频繁类的数据特征对稀有类数据特征的抑制性。实验结果表明，该方法在LVIS实例分割数据集上的检测精度提升了4.9%，达到了25.7%，同时AP_r、AP_c、AP_f分别达到了16.2%、26.1%、30.4%，相比Baseline方法均有明显提升，在消融实验上的结果也表明该方法能有效解决长尾类问题。

关键词: 长尾分布, 实例分割, 数据增强, 损失函数, 深度学习

Abstract: With the significant progress made by deep learning in the field of computer vision, datasets containing various categories are constantly being proposed.However, in naturally collected datasets, there are often class-imbalances and long-tailed distributions.As a result, the data features of rare classes are suppressed by the data features of frequent classes, seriously affecting the performance of the model.Therefore, the algorithm cannot be implemented well.To solve this problem, this paper proposes a method for image instance segmentation in which the long-tailed instance segmentation dataset is used for experiments.First, the dataset is processed by the data augmentation method based on the target scale to expand the training samples and increase the target number of rare classes.Rare class data are resampled to solve the problem of the excessively small number of rare class data, thereby improving the robustness of the model in long-tailed datasets.Finally, the equalization loss function is integrated into the Mask Region-based Convolutional Neural Network(R-CNN) instance segmentation network to reduce the inhibition of the data features of the frequent class to the data features of the rare class.In an experimental verification on the Large Vocabulary Instance Segmentation (LVIS) dataset, the method proposed improves the detection accuracy by 4.9%, reaching 25.7%.In addition, AP_r, AP_c, and AP_f reach 16.2%, 26.1%, and 30.4%, respectively, which are significantly improved compared with the Baseline method.The results of ablation experiments performed using the proposed method show that it can solve the long-tailed-distribution problem.

Key words: long tail distribution, instance segmentation, data augmentation, loss function, deep learning

中图分类号:

TP391.41

范馨月, 鲍泓, 潘卫国. 基于类别不平衡数据集的图像实例分割方法[J]. 计算机工程, 2022, 48(12): 224-231.

FAN Xinyue, BAO Hong, PAN Weiguo. Image Instance Segmentation Method Based on Class-imbalanced Dataset[J]. Computer Engineering, 2022, 48(12): 224-231.

https://www.ecice06.com/CN/Y2022/V48/I12/224

图/表 7

20230112184104

20230112184108

20230112184111

20230112184115

20230112184118

20230112184121

20230112184125

参考文献

[1] LIN T Y, MAIRE M, BELONGIE S, et al.Microsoft COCO:common objects in context[C]//Proceedings of IEEE ECCVʼ14.Washington D.C., USA:IEEE Press, 2014:740-755.
[2] CUI Y, JIA M L, LIN T Y, et al.Class-balanced loss based on effective number of samples[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:9260-9269.
[3] BUDA M, MAKI A, MAZUROWSKI M A.A systematic study of the class imbalance problem in convolutional neural networks[J].Neural Networks, 2018, 106:249-259.
[4] QI L, JIANG L, LIU S, et al.Amodal instance segmentation with KINS dataset[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:3009-3018.
[5] CUBUK E D, ZOPH B, MANÉ D, et al.AutoAugment:learning augmentation strategies from data[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:113-123.
[6] ZOPH B, CUBUK E D, GHIASI G, et al.Learning data augmentation strategies for object detection[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2020:566-583.
[7] MORE A.Survey of resampling techniques for improving classification performance in unbalanced datasets[EB/OL].[2021-12-01].https://arxiv.org/abs/1608.06048.
[8] KANG B, XIE S, ROHRBACH M, et al.Decoupling representation and classifier for long-tailed recognition[EB/OL].[2021-12-01].https://arxiv.org/abs/1910. 09217.
[9] ZHOU B Y, CUI Q, WEI X S, et al.BBN:bilateral-branch network with cumulative learning for long-tailed visual recognition[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:9716-9725.
[10] SATO I, NISHIMURA H, YOKOI K.APAC:augmented pattern classification with neural networks[EB/OL].[2021-12-01].https://arxiv.org/abs/1505.03229.
[11] ZHANG H, CISSE M, DAUPHIN Y N, et al.Mixup:beyond empirical risk minimization[EB/OL].[2021-12-01].https://arxiv.org/abs/1710.09412.
[12] ZHANG Z, HE T, ZHANG H, et al.Bag of freebies for training object detection neural networks[EB/OL].[2021-12-01].https://arxiv.org/abs/1902.04103.
[13] HATAYA R, ZDENEK J, YOSHIZOE K, et al.Faster AutoAugment:learning augmentation strategies using backpropagation[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2020:1-16.
[14] 彭玉青, 刘宪姿, 袁宏涛, 等.用于场景识别的多尺度注意力网络[J].传感器与微系统, 2021, 40(7):43-47. PENG Y Q, LIU X Z, YUAN H T, et al.Multi-scale attention network for scene recognition[J].Transducer and Microsystem Technologies, 2021, 40(7):43-47.(in Chinese)
[15] SINGH B, NAJIBI M, DAVIS L S.SNIPER:efficient multi-scale training[EB/OL].[2021-12-01].https://arxiv.org/abs/1805.09300.
[16] CHEN Y, ZHANG P, LI Z, et al.Stitcher:feedback-driven data provider for object detection[EB/OL].[2021-12-01].https://arxiv.org/abs/2004.12432.
[17] 张翠文, 张长伦, 何强, 等.目标检测中框回归损失函数的研究[J].计算机工程与应用, 2021, 57(20):97-103. ZHANG C W, ZHANG C L, HE Q, et al.Research on loss function of box regression in object detection[J].Computer Engineering and Applications, 2021, 57(20):97-103.(in Chinese)
[18] JAMAL M A, BROWN M, YANG M H, et al.Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:7607-7616.
[19] CHOU H P, CHANG S C, PAN J Y, et al.Remix:rebalanced mixup[EB/OL].[2021-12-01].https://arxiv.org/abs/2007.03943.
[20] LIU Z W, MIAO Z Q, ZHAN X H, et al.Large-scale long-tailed recognition in an open world[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:2532-2541.
[21] DRUMNOND C.Class imbalance and cost sensitivity:why undersampling beats oversampling[C]//Proceedings of IEEE ICML-KDDʼ03.Washington D.C., USA:IEEE Press, 2003:3-13.
[22] HUANG C, LI Y N, LOY C C, et al.Learning deep representation for imbalanced classification[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:5375-5384.
[23] OKSUZ K, CAM B C, AKBAS E, et al.Rank & sort loss for object detection and instance segmentation[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2021:2989-2998.
[24] REN S Q, HE K M, GIRSHICK R, et al.Faster R-CNN:towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.
[25] LIN T Y, GOYAL P, GIRSHICK R, et al.Focal loss for dense object detection[C]//Proceedings of 2017 IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2017:2999-3007.
[26] BLACK J, HASHIMZADE N, MYLES G.A Dictionary of Economics[M].Oxford, UK:Oxford University Press, 2017.
[27] WU J L, SONG L C, WANG T C, et al.Forest R-CNN:large-vocabulary long-tailed object detection and instance segmentation[C]//Proceedings of the 28th ACM International Conference on Multimedia.New York, USA:ACM Press, 2020:1570-1578.

选择文件类型/文献管理软件名称

选择包含的内容