作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2006, Vol. 32 ›› Issue (23): 216-217,. doi: 10.3969/j.issn.1000-3428.2006.23.077

• 人工智能及识别技术 • 上一篇    下一篇

基于单边抽样的LPU

沈 蕾,石盛平,燕继坤   

  1. (西南电子电信技术研究所信号盲处理国家重点实验室,成都 610041)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2006-12-05 发布日期:2006-12-05

LPU Based on Single Side Bagging

SHEN Lei, SHI Shengping, YAN Jikun   

  1. (State Key Laboratory of Blind Signal Processing, Southwest Institute of Electronics & Telecommunication Technology, Chengdu 610041)
  • Received:1900-01-01 Revised:1900-01-01 Online:2006-12-05 Published:2006-12-05

摘要: 提出结合单边抽样Bagging与LPU的基本思想对不平衡数据进行分类。主要步骤是:将未标注实例全标为反类,和正例一起训练单边抽样Bagging学习器,将得到的学习器对未标注实例分类得到可靠的反例(RN),再用正例和RN训练SSBagging学习器。使用Rocchio和EM进行分类是Liu等提出的一种有代表性的LPU。比较了这种LPU和该文提出的方法,发现当数据的不平衡性很明显时,后者要优于前者。

关键词: 不平衡分类, 未标注数据, Bagging, EM

Abstract: This paper studies how to classify unbalanced data using single side bagging and LPU. The main steps of the classification are as below: the paper labels all unlabeled data to be negative, and together with the positive data to train single side bagging classifier. After that, it uses the classifier to classify unlabeled data to get reliable negative (RN), then uses the positive data and RN to train SSBagging classifier. As one important technique of LPU is using Rocchio method and EM algorithm in the steps, it compares this method of LPU and the proposed method, finding that the latter one is better than the former when it is very distinctive of the unbalancity of data.

Key words: Classification of unbalanced data, Unlabeled data, Bagging, EM