基于Tri-training的主动学习算法

doi:10.3969/j.issn.1000-3428.2014.06.046

计算机工程

基于Tri-training的主动学习算法

张雁^1,2，吴保国¹，吕丹桔²，林英³

(1. 北京林业大学信息学院，北京 100083；2. 西南林业大学计算机与信息学院，昆明 650224；3. 云南大学软件学院，昆明 650091)

收稿日期:2013-02-28 出版日期:2014-06-15 发布日期:2014-06-13
作者简介:张雁(1973－)，女，副教授、博士研究生，主研方向：机器学习，智能信息处理；吴保国(通讯作者)，教授、博士生导师；吕丹桔，副教授、博士研究生；林英，副教授、博士。
基金资助:
云南省教育厅科研基金资助项目(2010Y290, 2012C098)。

Active Learning Algorithm Based on Tri-training

ZHANG Yan^1,2, WU Bao-guo ¹, LV Dan-ju ², LIN Ying³

(1. School of Information, Beijing Forestry University, Beijing 100083, China; 2. School of Computer and Information, Southwest Forestry University, Kunming 650224, China; 3. School of Software, Yunnan University, Kunming 650091, China)

Received:2013-02-28 Online:2014-06-15 Published:2014-06-13

摘要/Abstract

摘要： 半监督学习和主动学习都是利用未标记数据，在少量标记数据代价下同时提高监督学习识别性能的有效方法。为此，结合主动学习方法与半监督学习的Tri-training算法，提出一种新的分类算法，通过熵优先采样算法选择主动学习的样本。针对UCI数据集和遥感数据，在不同标记训练样本比例下进行实验，结果表明，该算法在标记样本数较少的情况下能取得较好的效果。将主动学习与Tri-training算法相结合，是提高分类性能和泛化性的有效途径。

关键词: 半监督学习, 主动学习, Tri-training算法, 熵优先采样, Tri-EPS算法

Abstract: Both semi-supervised learning and active learning attempt to exploit the unlabeled data to improve the recognition rate of supervised learning algorithms and minimize the cost of data labeling. So this paper proposes an algorithm to select samples in active learning such as Entropy Priority Sampling(EPS). It combines with the Tri-training algorithm and active learning method. Experimental results on both the UCI and image datasets under different proportion of marker training samples show that, this algorithm can obtain better result in the case of fewer labeled examples, and the combination of the active learning with semi-supervised learning is an effective way to improve the performance and generalization.

Key words: semi-supervised learning, active learning, Tri-training algorithm, Entropy Priority Sampling(EPS), Tri-EPS algorithm

中图分类号:

TP181

张雁，吴保国，吕丹桔，林英. 基于Tri-training的主动学习算法[J]. 计算机工程.

ZHANG Yan, WU Bao-guo, LV Dan-ju, LIN Ying. Active Learning Algorithm Based on Tri-training[J]. Computer Engineering.

https://www.ecice06.com/CN/Y2014/V40/I6/215

参考文献

参考文献 [1] Zhu Xiaojin. Semi-supervised Learning Literature Survey[R]. Department of Computer Sciences, University of Wisconsin at Madison, Tech. Rep: 1530, 2008. [2] 周志华, 王珏. 机器学习及其应用[M]. 北京: 清华大学出版社, 2007. [3] Zhou Zhihua, Zhan Dechuan, Yang Qiang. Semi-supervised Learning with Very Few Labeled Training Examples[C]// Proceedings of the 22nd AAAI Conference on Artificial Intelligence. Vancouver, Canada: AAAI Press, 2007: 675-680. [4] 杨伟, 方涛, 许刚. 基于朴素贝叶斯的半监督学习遥感影像分类[J]. 计算机工程, 2010, 36(20): 167-169. [5] Seeger M. Learning with Labeled and Unlabeled Data[R]. Institute for Adaptive and Neural Computation, University of Edinburgh, Tech. Rep.: EPFL-REPORT-161327, 2002. [6] Hady M F A, Schwenker F. Combining Committee-based Semi-supervised and Active Learning[J]. Journal of Computer Science and Technology, 2010, 25(4): 681-698. (下转第229页) (上接第218页) [7] 吴伟宁, 刘扬, 郭茂祖, 等. 基于采样策略的主动学习算法研究进展[J]. 计算机研究与发展, 2012, 49(6): 1162-1173. [8] Lewis D, Gale W. A Sequential Algorithm for Training Text Classifiers[C]//Proceedings of the 17th ACM International Conference on Research and Development in Information Retrieval. Dublin, Ireland: ACM Press, 1994: 3-12. [9] Seuong H, Opper M, Sompolinski H. Query by Committee[C]// Proceedings of the 5th ACM Workshop on Computational Learning Theory. Pittsburgh, USA: ACM Press, 1992: 287-294. [10] Freund Y, Seung H S, Shamir E, et al. Selective Sampling Using the Query by Committee Algorithm[J]. Machine Learning, 1997, 28(2/3): 133-168. [11] McCallum A K, Nigram K. Employing EM and Pool-based Active Learning for Text Classification[C]//Proceedings of the 15th International Conference on Machine Learning. Madison, USA: [s. n.], 1998: 350-358. [12] Muslea I, Minton S, Knoblock C A. Active+Semi-supervised Learning=Robust Multi-view Learning[C]//Proceedings of the 19th International Conference on Machine Learning. Sydney, Australia: [s. n.], 2002: 435-442. [13] Muslea I, Minton S, Knoblock C A. Selective Sampling with Redundant Views[C]//Proceedings of the 17th International Conference on Machine Learning. Stanford, USA: [s. n.], 2000: 621-626. [14] Zhou Zhihua, Chen Kejia, Jiang Yuan. Exploiting Unlabeled Data in Content-based Image Retrieval[C]//Proceedings of the 15th European Conference on Machine Learning. Pisa, Italy: [s. n.], 2004: 525-536. [15] Zhou Zhihua, Li Ming. Tri-Training: Exploiting Unlabeled Data Using Three Classifiers[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11): 1529-1541. [16] Witten I H, Frank E, Hall M A. Data Mining: Practical Machine Learning Tools and Techniques[M]. 3rd ed. [S. l.]: Morgan Kaufmann, 2011. [17] Li Ming, Zhou Zhihua. SETRED: Self-training with Editing[C]// Proceedings of PAKDD’05. Heidelberg, Germany: Springer- Verlag, 2005: 611-621. [18] 邓超, 郭茂祖. 基于自适应数据剪辑策略的Tri-training算法[J]. 计算机学报, 2007, 30(8): 1213-1226. 编辑金胡考

选择文件类型/文献管理软件名称

选择包含的内容

基于Tri-training的主动学习算法

Active Learning Algorithm Based on Tri-training

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	高睿, 安国成, 邹丹平, 裴凌. 基于改进YOLOv5的半监督车辆检测算法[J]. 计算机工程, 2025, 51(3): 300-309.
[2]	张新波, 张雪英, 黄丽霞, 陈桂军. 基于半监督深度自编码网络的分类算法及应用[J]. 计算机工程, 2025, 51(1): 71-80.
[3]	郭敏, 张熙涵, 李阳. 融合注意力的教师互一致性半监督医学图像分割[J]. 计算机工程, 2024, 50(9): 313-323.
[4]	顾永跟, 高凌轩, 吴小红, 陶杰. 非独立同分布下联邦半监督学习的数据分享研究[J]. 计算机工程, 2024, 50(6): 188-196.
[5]	陈仲磊, 伊鹏, 陈祥, 胡涛. 基于集成学习的系统调用实时异常检测框架[J]. 计算机工程, 2023, 49(6): 162-169,179.
[6]	郭江涛, 彭甫镕. 基于全局节点和多片段的格栅命名实体识别[J]. 计算机工程, 2023, 49(12): 96-102.
[7]	佘朝阳, 严馨, 徐广义, 陈玮, 邓忠莹. 融合数据增强与半监督学习的药物不良反应检测[J]. 计算机工程, 2022, 48(6): 314-320.
[8]	胡彬, 王晓军, 张雷. 一种半监督对抗鲁棒模型无关元学习方法[J]. 计算机工程, 2022, 48(12): 112-118.
[9]	高伟, 吴顺. 基于多尺度注意力半监督学习的老照片划痕修复[J]. 计算机工程, 2022, 48(10): 245-251,261.
[10]	康璐璐, 范兴容, 王茜竹, 杨晓雅, 明蕊. 基于特征组分层与半监督学习的鼠标轨迹识别[J]. 计算机工程, 2021, 47(4): 277-284.
[11]	薛子晗, 潘迪, 何丽. 结合改进密度峰值聚类的LGC半监督学习方法优化[J]. 计算机工程, 2021, 47(2): 77-83,89.
[12]	庄立纯, 张正军, 张乃今, 李君娣. 基于非线性Logistic模型的改进UDEED算法[J]. 计算机工程, 2019, 45(7): 208-211.
[13]	刘其开,姜代红,李文吉. 基于分段损失的生成对抗网络[J]. 计算机工程, 2019, 45(5): 155-160,168.
[14]	林金钏,艾浩军. 噪声可容忍的标记组合半监督学习算法[J]. 计算机工程, 2019, 45(4): 157-162,168.
[15]	张小斐,耿俊成,孙玉宝. 图正则非线性岭回归模型的异常用电行为识别[J]. 计算机工程, 2018, 44(6): 8-12.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于Tri-training的主动学习算法

Active Learning Algorithm Based on Tri-training

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价