针对标签噪声数据的自步半监督降维

doi:10.19678/j.issn.1000-3428.0067397

摘要/Abstract

摘要：

数据类别标记是一项费时费力的工作，且标记质量会直接影响模型预测性能。基于自步学习机制构建自步半监督降维框架，将由简单到复杂的样本逐步纳入模型训练过程。在此框架下，设计自步半监督降维算法，依据交替优化策略，在更新降维映射函数与计算样本重要度之间交替迭代。一方面，最小化低维标签数据的加权类内分散程度，且考虑再生核希尔伯特空间中的函数复杂度正则化项与数据稀疏结构图上的光滑度正则化项，得到降维映射。另一方面，依据自步学习机制，计算标签数据的低维表示与其所在类的锚点之间的距离，给定下次迭代时样本的重要度。所提框架及算法对标签噪声具有较好的鲁棒性，能自适应给出标签样本的重要度及显性非线性的降维映射，所得的低维表示具有较强的可分性与判别性。在5个实验数据集上，对于标签具有噪声的数据，所提算法获得的低维表示的最近邻分类准确率分别比次优算法最多提高了2.2、5.6、5.0、11.3、2.7个百分点，验证了所提算法的有效性和鲁棒性。

关键词: 半监督降维, 自步学习, 映射, 稀疏表示, 特征提取

Abstract:

Data labeling is a time-consuming and laborious task, and the quality of labeling directly affects the predictive performance of the model. Based on the Self-Paced Learning(SPL) mechanism, a Self-Paced Semi-Supervised Dimensionality Reduction(SPSSDR) framework is proposed to incorporate simple to complex samples into training. The SPSSDR algorithm proposed under this framework alternately iterates between feature mapping updating and sample importance calculating, according to the alternative optimization strategy. On the one hand, to obtain the feature mapping for dimensionality reduction, the weighted intra-class dispersion of low-dimensional labeled data is minimized, considering the complexity regularization term in the reproducing kernel Hilbert space and the smoothness regularization term on the sparse structured data graph. On the other hand, based on the SPL mechanism, the distance between each low-dimensional labeled sample and the corresponding class anchor is calculated, to assign an importance value to the sample in the next iteration. The proposed framework and algorithm robustly label noise and can adaptively provide the importance values of labeled samples and explicit nonlinear feature mapping for dimensionality reduction, thereby obtaining low-dimensional representations with strong separability and discrimination. Compared with the sub-optimal algorithms, the nearest neighbor classification accuracies of low-dimensional representations by the SPSSDR algorithm provide an improvement of up to 2.2, 5.6, 5.0, 11.3, and 2.7 percentage points respectively, on five experimental datasets for data with noisy labels. The experimental results demonstrate the effectiveness and robustness of the proposed algorithm.

Key words: semi-supervised dimensionality reduction, Self-Paced Learning(SPL), mapping, sparse representation, feature extraction

古楠楠. 针对标签噪声数据的自步半监督降维[J]. 计算机工程, 2023, 49(11): 131-142.

Nannan GU. Self-Paced Semi-Supervised Dimensionality Reduction for Data with Noisy Labels[J]. Computer Engineering, 2023, 49(11): 131-142.

http://www.ecice06.com/CN/Y2023/V49/I11/131

图/表 16

参考文献 30

1	陶洋, 鲍灵浪, 胡昊. 结合表示学习与嵌入子空间学习的降维方法. 计算机工程, 2021, 47 (6): 83-87, 97. doi: 10.19678/j.issn.1000-3428.0057932
	TAO Y , BAO L L , HU H . Dimensionality reduction method combining representation learning and embedded subspace learning. Computer Engineering, 2021, 47 (6): 83-87, 97. doi: 10.19678/j.issn.1000-3428.0057932
2	陈良臣, 高曙, 刘宝旭, 等. 网络流量异常检测中的维数约简研究. 计算机工程, 2020, 46 (2): 11- 20. doi: 10.19678/j.issn.1000-3428.0056532
	CHEN L C , GAO S , LIU B X , et al. Research on dimensionality reduction in network traffic anomaly detection. Computer Engineering, 2020, 46 (2): 11- 20. doi: 10.19678/j.issn.1000-3428.0056532
3	赵慧, 于金钊. 可加风险模型下现状数据的降维问题. 数理统计与管理, 2023, 42 (3): 439- 448. doi: 10.13860/j.cnki.sltj.20221123-006
	ZHAO H , YU J Z . Dimension reduction for additive risk model with current status data. Journal of Applied Statistics and Management, 2023, 42 (3): 439- 448. doi: 10.13860/j.cnki.sltj.20221123-006
4	RAN R S , FENG J , ZHANG S G , et al. A general matrix function dimensionality reduction framework and extension for manifold learning. IEEE Transactions on Cybernetics, 2022, 52 (4): 2137- 2148. doi: 10.1109/TCYB.2020.3003620
5	ZHANG T Y , SHEN F R , ZHU T , et al. An evolutionary orthogonal component analysis method for incremental dimensionality reduction. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33 (1): 392- 405. doi: 10.1109/TNNLS.2020.3027852
6	强倩瑶, 张斌. 灵活自适应的无监督降维. 计算机学报, 2022, 45 (11): 2290- 2305. doi: 10.11897/SP.J.1016.2022.02290
	QIANG Q Y , ZHANG B . Flexible and adaptive unsupervised dimension reduction. Chinese Journal of Computers, 2022, 45 (11): 2290- 2305. doi: 10.11897/SP.J.1016.2022.02290
7	SU B , DING X Q , WANG H , et al. Discriminative dimensionality reduction for multi-dimensional sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40 (1): 77- 91. doi: 10.1109/TPAMI.2017.2665545
8	HALLAJI E , FARAJZADEH-ZANJANI M , RAZAVI-FAR R , et al. Constrained generative adversarial learning for dimensionality reduction. IEEE Transactions on Knowledge and Data Engineering, 2023, 35 (3): 2394- 2405.
9	ZHOU R X , GAO W S , DING D W , et al. Supervised dimensionality reduction technology of generalized discriminant component analysis and its kernelization forms. Pattern Recognition, 2022, 124, 108450. doi: 10.1016/j.patcog.2021.108450
10	NIE F P , XU D , TSANG I W H , et al. Flexible manifold embedding: a framework for semi-supervised and unsupervised dimension reduction. IEEE Transactions on Image Processing, 2010, 19 (7): 1921- 1932. doi: 10.1109/TIP.2010.2044958
11	NIE F P , DONG X , LI X L . Unsupervised and semisupervised projection with graph optimization. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32 (4): 1547- 1559. doi: 10.1109/TNNLS.2020.2984958
12	WANG F , ZHU L , XIE L , et al. Label propagation with structured graph learning for semi-supervised dimension reduction. Knowledge-Based Systems, 2021, 225, 107130. doi: 10.1016/j.knosys.2021.107130
13	HAN B, YAO Q M, LIU T L, et al. A survey of label-noise representation learning: past, present and future[EB/OL]. [2023-03-17]. https://arxiv.org/abs/2011.04406.
14	XIA X B , HAN B , WANG N N , et al. Extended T: learning with mixed closed-set and open-set noisy labels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45 (3): 3047- 3058.
15	KUMAR M, PACKER B, KOLLER D. Self-paced learning for latent variable models[C]//Proceedings of Neural Information Processing Systems Conference. New York, USA: ACM Press, 2010: 1189-1197.
16	MENG D Y , ZHAO Q , JIANG L . A theoretical understanding of self-paced learning. Information Sciences, 2017, 414, 319- 328. doi: 10.1016/j.ins.2017.05.043
17	FAN M Y , GU N N , QIAO H , et al. Dimensionality reduction: an interpretation from manifold regularization perspective. Information Sciences, 2014, 277, 694- 714. doi: 10.1016/j.ins.2014.03.011
18	BELKIN M , NIYOGI P , SINDHWANI V . Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 2006, 7, 2399- 2434.
19	SONG Y Q , NIE F P , ZHANG C S , et al. A unified framework for semi-supervised dimensionality reduction. Pattern Recognition, 2008, 41 (9): 2789- 2799. doi: 10.1016/j.patcog.2008.01.001
20	束俊, 孟德宇, 徐宗本. 元自步学习. 中国科学: 信息科学, 2020, 50 (6): 781- 793. URL
	SHU J , MENG D Y , XU Z B . Meta self-paced learning. Scientia Sinica(Informationis), 2020, 50 (6): 781- 793. URL
21	MA F , MENG D , DONG X , et al. Self-paced multi-view co-training. Journal of Machine Learning Research, 2020, 21, 1- 38.
22	ZHANG W C , XU D , OUYANG W L , et al. Self-paced collaborative and adversarial network for unsupervised domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43 (6): 2047- 2061. doi: 10.1109/TPAMI.2019.2962476
23	LIN L , WANG K Z , MENG D Y , et al. Active self-paced learning for cost-effective and progressive face identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40 (1): 7- 19. doi: 10.1109/TPAMI.2017.2652459
24	YANG J F , WU X P , LIANG J , et al. Self-paced balance learning for clinical skin disease recognition. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31 (8): 2832- 2846. doi: 10.1109/TNNLS.2019.2917524
25	刘莹莹, 邱崧, 孙力, 等. 基于多视角自步学习的人体动作识别方法. 计算机工程, 2018, 44 (2): 257- 263. URL
	LIU Y Y , QIU S , SUN L , et al. Human action recognition method based on multi-view self-paced learning. Computer Engineering, 2018, 44 (2): 257- 263. URL
26	CHENG B , YANG J C , YAN S C , et al. Learning with l1-graph for image analysis. IEEE Transactions on Image Processing, 2010, 19 (4): 858- 866. doi: 10.1109/TIP.2009.2038764
27	LIU G C , LIN Z C , YAN S C , et al. Robust recovery of subspace structures by low-rank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35 (1): 171- 184.
28	HE X F , YAN S C , HU Y X , et al. Face recognition using Laplacianfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27 (3): 328- 340.
29	WRIGHT J , YANG A Y , GANESH A , et al. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31 (2): 210- 227.
30	BAZARAA M , SHERALI H , SHETTY C . Nonlinear programming: theory and algorithms. 3rd ed New York, USA: John Wiley & Sons, 2006.

[1]	刘阳, 张圣杰. 一种基于时间自动机的安全智能合约生成方法[J]. 计算机工程, 2023, 49(9): 137-143, 157.
[2]	江雨燕, 陶承凤, 李平. 数据增强和自适应自步学习的深度子空间聚类算法[J]. 计算机工程, 2023, 49(8): 96-103, 110.
[3]	马娜, 温廷新, 贾旭, 李晓会. 复杂光照条件下自适应的车脸重识别模型[J]. 计算机工程, 2023, 49(8): 275-282, 290.
[4]	谢虹, 姜文刚. RRA-InceptionV3结合鲁棒稀疏表示的表情识别方法[J]. 计算机工程, 2023, 49(7): 196-203.
[5]	戴浩磊, 黄永慧, 周郭许. 基于超图正则化非负张量链分解的聚类分析[J]. 计算机工程, 2023, 49(6): 81-89.
[6]	宋羽凯, 谢江. 基于多任务学习的轻量级语音情感识别模型[J]. 计算机工程, 2023, 49(5): 122-128.
[7]	关日鹏, 况立群, 焦世超, 熊风光, 韩燮. 多模态特征融合与词嵌入驱动的三维检索方法[J]. 计算机工程, 2023, 49(4): 101-107,113.
[8]	李培育, 张雅丽. 基于改进SRGAN模型的人脸图像超分辨率重建[J]. 计算机工程, 2023, 49(4): 199-205.
[9]	耿磊, 傅洪亮, 陶华伟, 卢远, 郭歆莹, 赵力. 基于动态卷积递归神经网络的语音情感识别[J]. 计算机工程, 2023, 49(4): 125-130,137.
[10]	何悦, 陈广胜, 景维鹏, 徐泽堃. 基于深度多相似性哈希方法的遥感图像检索[J]. 计算机工程, 2023, 49(2): 206-212.
[11]	王畅, 李雷孝, 杨艳艳. 基于面部多特征融合的疲劳驾驶检测综述[J]. 计算机工程, 2023, 49(11): 1-12.
[12]	方琳琳, 邓豪, 张华, 赵俊琴. 基于多元重映射的γ辐射场景图像增强方法[J]. 计算机工程, 2023, 49(11): 195-202.
[13]	张会云, 黄鹤鸣. 面向网络舆情分析的多任务学习策略时间卷积网络[J]. 计算机工程, 2023, 49(10): 89-96, 104.
[14]	禹克强, 黄芳, 吴琪, 欧阳洋. 基于双向语义的中文实体关系联合抽取方法[J]. 计算机工程, 2023, 49(1): 92-99,112.
[15]	高庆吉, 李天昊, 邢志伟, 刘佩佩. 基于区块特征融合的点云语义分割方法[J]. 计算机工程, 2022, 48(9): 37-44,54.

选择文件类型/文献管理软件名称

选择包含的内容