作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于特征变换的Tri-Training算法

赵文亮,郭华平,范 明   

  1. (郑州大学信息工程学院,郑州 450052)
  • 收稿日期:2013-04-22 出版日期:2014-05-15 发布日期:2014-05-14
  • 作者简介:赵文亮(1989-),男,硕士研究生,主研方向:数据挖掘,机器学习;郭华平,博士研究生;范 明,教授。

Tri-Training Algorithm Based on Feature Transformation

ZHAO Wen-liang, GUO Hua-ping, FAN Ming   

  1. (School of Information Engineering, Zhengzhou University, Zhengzhou 450052, China)
  • Received:2013-04-22 Online:2014-05-15 Published:2014-05-14

摘要: 提出一种基于特征变换的Tri-Training算法。通过特征变换将已标记实例集映射到新空间,得到有差异的训练集,从而构建准确又存在差异的基分类器,避免自助采样不能充分利用全部已标记实例集的问题。为充分利用数据类分布信息,设计基于Must-link和Cannot-link约束集合的特征变换方法(TMC),并将其用于基于特征变换的Tri-Training算法中。在UCI数据集上的实验结果表明,在不同未标记率下,与经典的Co-Training、Tri-Trainng算法相比,基于特征变换的Tri-Training算法可在多数数据集上得到更高的准确率。此外,与Tri-LDA和Tri-CP算法相比,基于TMC的Tri-Training算法具有更好的泛化性能。

关键词: 特征变换, 已标记实例集, 差异, 自助抽样, 泛化能力

Abstract: This paper proposes a new Tri-Training algorithm based on feature transformation. It employs feature transformation to transform labeled instances into new space to obtain new training sets, and constructs accurate and diverse classifiers. In this way, it avoids the weakness of bootstrap sampling which only adopts training data samples to train base classifiers. In order to make full use of the data distribution information, this paper introduces a new transformation method called Transformation Based on Must-link Constrains and Cannot-link Constrains(TMC), and uses it to this new Tri-Training algorithm. Experimental results on UCI data sets show that, in different unlabeled rate, compared with the classic Co-Training and Tri-Training algorithm, the proposed algorithm based on feature transformation gets the highest accuracy in most data sets. In addition, compared with the Tri-LDA and Tri-CP algorithm, the Tri-Training algorithm based on TMC has better generalization ability.

Key words: feature transformation, labeled instances set, difference, bootstrap sampling, generalization ability

中图分类号: