作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2011, Vol. 37 ›› Issue (21): 124-125,130. doi: 10.3969/j.issn.1000-3428.2011.21.042

• 人工智能及识别技术 • 上一篇    下一篇

基于iTopicModel的关联文本分类算法

梁鹏鹏,柴玉梅,王黎明   

  1. (郑州大学信息工程学院,郑州 450001)
  • 收稿日期:2011-04-12 出版日期:2011-11-05 发布日期:2011-11-05
  • 作者简介:梁鹏鹏(1986-),男,硕士,主研方向:机器学习,数据挖掘;柴玉梅,副教授、硕士;王黎明,教授、博士
  • 基金资助:
    国家自然科学基金资助项目(60970083)

Relational Text Classification Algorithm Based on iTopicModel

LIANG Peng-peng, CHAI Yu-mei, WANG Li-ming   

  1. LIANG Peng-peng, CHAI Yu-mei, WANG Li-ming
  • Received:2011-04-12 Online:2011-11-05 Published:2011-11-05

摘要: 针对传统文本分类方法对文档间关联关系考虑不充分的问题,提出一种基于iTopicModel的关联文本分类算法。根据类信息已知的文档归属于各个主题的概率判断主题代表的类信息,利用待分类文档归属于各个主题的概率及文本信息对文档进行分类。实验结果表 明,当文档间的关联关系对类信息影响较大时,TC-iTM的分类性能优于传统文本分类方法。

关键词: 文本分类, 文档网络, 主题模型, EM算法

Abstract: In order to solve the problem that traditional text classification methods do not emphasize the links among text documents enough , this paper proposes a novel text classification algorithm TC-iTM based on iTopicModel. TC-iTM uses the probability that the labeled documents are assigned to each topic to judge the category that each topic represents. TC-iTM classifies unlabelled documents by using the probability that the documents are assigned to each topic and the text information of these documents. Experimental result shows that TC-iTM outperforms the traditional text classification methods when links among documents are important to the categories of the documents in document network.

Key words: text classification, document network, topic model, EM algorithm

中图分类号: