计算机工程 ›› 2019, Vol. 45 ›› Issue (4): 157-162,168.doi: 10.19678/j.issn.1000-3428.0050398

• 人工智能及识别技术 • 上一篇    下一篇

噪声可容忍的标记组合半监督学习算法

林金钏,艾浩军   

  1. 武汉大学 计算机学院,武汉 430072
  • 收稿日期:2018-02-02 出版日期:2019-04-15 发布日期:2019-04-15
  • 作者简介:林金钏(1992—),男,硕士研究生,主研方向为迁移学习、复杂网络;艾浩军,副教授。
  • 基金项目:

    国家重点研发计划(2016YFB0502201)。

Noise Tolerant Label Combination Semi-supervised Learning Algorithm

LIN Jinchuan,AI Haojun   

  1. School of Computer Science,Wuhan University,Wuhan 430072,China
  • Received:2018-02-02 Online:2019-04-15 Published:2019-04-15

摘要:

针对传统机器学习方法在完成分类任务时多数存在人工标记成本较高、泛化能力较弱的问题,提出一种标记组合半监督学习算法。基于集成学习的思想,利用有标记数据训练多个弱模型并进行组合,增强模型的泛化能力。对无标记数据进行预测,生成有噪声的标记并组合建模。在风险最小化的框架下,使模型收敛达到最优。实验结果表明,在2种有监督场景下与现有的支持向量机、分类与回归树、神经网络等算法相比,该算法具有较优的泛化能力。

关键词: 半监督学习, 集成学习, 风险最小化, 梯度下降, 损失函数

Abstract:

Traditional machine learning method always needs high cost manual marking process,and exhibits weak ability of generalization in classification task.In order to solve these problems,a label combination semi-supervised learning algorithm is proposed.Taking advantage of the principle of ensemble learning,the algorithm uses the labeled data to train multiple weak learners,and combine them to enhance the generalization ability.Predict the unlabeled data to generate noise labels,and then combine and model these noise labels to make the model more robust.Under the framework of risk minimization,the model converges to the optimal state.Experimental results show that,compared with some existing learning algorithms like Support Vector Machine(SVM),Classification and Regression Tree(CART),Neural Network(NN),the algorithm has relatively good generalization ability.

Key words: semi-supervised learning, ensemble learning, risk minimization, gradient descent, loss function

中图分类号: