摘要: 利用正样例集合和未标识样例集合获取初始的最强反例集合是使用两步框架方法构造一个面向PU问题文本分类器的基础。该文指出了使用1-DNF算法抽取初始的最强反例集合的局限性,提出了对算法1-DNF的改进方法。实验结果表明,与原算法相比,它大大增加了获取的最强反例数目,加快了算法的收敛速度,提高了分类器的精度。
关键词:
文本分类,
面向PU问题的文本分类,
文本分类器
Abstract: Extracting initial strongly negative data set from positive data and unlabeled data is a base for constructing a PU-oriented text classifier by two stage frame method. The limitations in the 1-DNF algorithm for getting initial strongly negative data set are described. An improved 1-DNF algorithm is proposed. The experiment result demonstrates the number of initial strongly negative examples got from positive data and unlabeled data is increased greatly, compared with original 1-DNF algorithm. The convergence speed of algorithm is accelerated, and the precision of the classifier is raised.
Key words:
Text classification,
PU-oriented text classification,
Text classifier
中图分类号:
赫枫龄;左万利;于海龙. 用改进的1-DNF算法获取最强反例集合的方法[J]. 计算机工程, 2007, 33(09): 191-193.
HE Fengling; ZUO Wanli; YU Hailong. Method for Extracting Strongly Negative Data Set
by Improved 1-DNF Algorithm
[J]. Computer Engineering, 2007, 33(09): 191-193.