作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2010, Vol. 36 ›› Issue (9): 200-202. doi: 10.3969/j.issn.1000-3428.2010.09.070

• 人工智能及识别技术 • 上一篇    下一篇

一种基于特征投票的文本分类方法

焦庆争1,2,蔚承建1   

  1. (1. 南京工业大学信息科学与工程学院,南京 210009;2. 安徽师范大学信息管理中心,芜湖 241000)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2010-05-05 发布日期:2010-05-05

Text Categorization Method Based on Features Vote

JIAO Qing-zheng1,2, WEI Cheng-jian1   

  1. (1. College of Information Science and Engineering, Nanjing University of Technology, Nanjing 210009; 2. Information Management Center, Anhui Normal University, Wuhu 241000)
  • Received:1900-01-01 Revised:1900-01-01 Online:2010-05-05 Published:2010-05-05

摘要: 基于特征投票机制设计一种线性文本分类方法,运用信任机制理论分析文档类别对特征的信任关系,给出具体特征信任度的模型,并在Newsgroup、复旦中文分类语料、Reuters-21578 3个广泛使用且具有不同特性的语料集上与传统方法进行比较。实验结果表明,该方法分类性能优于传统方法且稳定、高效,适用于大规模文本分类任务。

关键词: 文本分类, 特征投票, 经验概率, 自然语言处理

Abstract: This paper discusses a text categorization approach based on features vote, which is linear as well as high efficient. It uses the trust mechanism theory to analyze the trust relation between features and document classes, and gives the model to calculate the trust values. In the comparison experiments, Newsgroup, Fudan Chinese evaluation data collection and Reuters-21578 are used to evaluate the effectiveness of the techniques. Experimental results show the method can improve the performance for text categorization, and is suitable for large-scale text categorization.

Key words: text categorization, features vote, empirical probability, natural language processing

中图分类号: