摘要: 在分类器训练过程中,无标记数据的引入容易产生噪音,从而降低分类精度。为此,提出一种基于图的置信度估计半监督协同训练算法。利用样本数据自身的结构信息,计算无标记样本所属类别概率。采用多分类器对无标记数据进行置信度估计,以提高无标记数据挑选标准,减少噪音数据的引入。在UCI数据集上的对比实验验证了该算法的有效性。
关键词:
半监督学习,
协同训练,
置信度,
分类,
无标记数据
Abstract: In classifier training process, the introduction of unlabeled data can cause noise data, and it reduces classification accuracy. This paper proposes Confidence Estimation for Semi-supervised Learning based on graph(CESL) algorithm. The algorithm makes use of structure information of sample data to calculate classification probability of unlabeled data explicitly. Combined with multi-classifiers, the algorithm estimates the confidence of unlabeled data implicitly and improves the selection criteria. With dual-confidence estimation, the unlabeled data is selected to update classifiers. Experiments on UCI datasets prove the efficiency of this algorithm.
Key words:
semi-supervised learning,
collaborative training,
confidence,
classification,
unlabeled data
中图分类号:
郭涛, 李贵洋, 兰霞. 基于图的半监督协同训练算法[J]. 计算机工程, 2012, 38(13): 163-165,168.
GUO Chao, LI Gui-Xiang, LAN Xia. Semi-supervised Collaborative Training Algorithm Based on Graph[J]. Computer Engineering, 2012, 38(13): 163-165,168.