Abstract:
This paper discusses a text categorization approach based on features vote, which is linear as well as high efficient. It uses the trust mechanism theory to analyze the trust relation between features and document classes, and gives the model to calculate the trust values. In the comparison experiments, Newsgroup, Fudan Chinese evaluation data collection and Reuters-21578 are used to evaluate the effectiveness of the techniques. Experimental results show the method can improve the performance for text categorization, and is suitable for large-scale text categorization.
Key words:
text categorization,
features vote,
empirical probability,
natural language processing
摘要: 基于特征投票机制设计一种线性文本分类方法,运用信任机制理论分析文档类别对特征的信任关系,给出具体特征信任度的模型,并在Newsgroup、复旦中文分类语料、Reuters-21578 3个广泛使用且具有不同特性的语料集上与传统方法进行比较。实验结果表明,该方法分类性能优于传统方法且稳定、高效,适用于大规模文本分类任务。
关键词:
文本分类,
特征投票,
经验概率,
自然语言处理
CLC Number:
JIAO Qing-zheng; WEI Cheng-jian. Text Categorization Method Based on Features Vote[J]. Computer Engineering, 2010, 36(9): 200-202.
焦庆争;蔚承建. 一种基于特征投票的文本分类方法[J]. 计算机工程, 2010, 36(9): 200-202.