作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于Tri-training的主动学习算法

张 雁1,2,吴保国1,吕丹桔2,林 英3   

  1. (1. 北京林业大学信息学院,北京 100083;2. 西南林业大学计算机与信息学院,昆明 650224;3. 云南大学软件学院,昆明 650091)
  • 收稿日期:2013-02-28 出版日期:2014-06-15 发布日期:2014-06-13
  • 作者简介:张 雁(1973-),女,副教授、博士研究生,主研方向:机器学习,智能信息处理;吴保国(通讯作者),教授、博士生导师;吕丹桔,副教授、博士研究生;林 英,副教授、博士。
  • 基金资助:
    云南省教育厅科研基金资助项目(2010Y290, 2012C098)。

Active Learning Algorithm Based on Tri-training

ZHANG Yan 1,2, WU Bao-guo 1, LV Dan-ju 2, LIN Ying 3   

  1. (1. School of Information, Beijing Forestry University, Beijing 100083, China; 2. School of Computer and Information, Southwest Forestry University, Kunming 650224, China; 3. School of Software, Yunnan University, Kunming 650091, China)
  • Received:2013-02-28 Online:2014-06-15 Published:2014-06-13

摘要: 半监督学习和主动学习都是利用未标记数据,在少量标记数据代价下同时提高监督学习识别性能的有效方法。为此,结合主动学习方法与半监督学习的Tri-training算法,提出一种新的分类算法,通过熵优先采样算法选择主动学习的样本。针对UCI数据集和遥感数据,在不同标记训练样本比例下进行实验,结果表明,该算法在标记样本数较少的情况下能取得较好的效果。将主动学习与Tri-training算法相结合,是提高分类性能和泛化性的有效途径。

关键词: 半监督学习, 主动学习, Tri-training算法, 熵优先采样, Tri-EPS算法

Abstract: Both semi-supervised learning and active learning attempt to exploit the unlabeled data to improve the recognition rate of supervised learning algorithms and minimize the cost of data labeling. So this paper proposes an algorithm to select samples in active learning such as Entropy Priority Sampling(EPS). It combines with the Tri-training algorithm and active learning method. Experimental results on both the UCI and image datasets under different proportion of marker training samples show that, this algorithm can obtain better result in the case of fewer labeled examples, and the combination of the active learning with semi-supervised learning is an effective way to improve the performance and generalization.

Key words: semi-supervised learning, active learning, Tri-training algorithm, Entropy Priority Sampling(EPS), Tri-EPS algorithm

中图分类号: