作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2012, Vol. 38 ›› Issue (08): 128-130. doi: 10.3969/j.issn.1000-3428.2012.08.042

• 人工智能及识别技术 • 上一篇    下一篇

一种有指导的文本特征加权改进算法

刘端阳,陆 洋   

  1. (浙江工业大学计算机科学与技术学院,杭州 310023)
  • 收稿日期:2011-09-12 出版日期:2012-04-20 发布日期:2012-04-20
  • 作者简介:刘端阳(1975-),男,副教授、博士,主研方向:数据挖掘,分布式计算;陆 洋,硕士
  • 基金资助:
    国家自然科学基金资助项目(EC0017540)

Improved Supervised Algorithm of Text Feature Weighting

LIU Duan-yang, LU Yang   

  1. (College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China)
  • Received:2011-09-12 Online:2012-04-20 Published:2012-04-20

摘要: 传统tf.idf方法未利用分类数据的特性,无法反映词在各个类别之间的比例关系。为此,在分析有指导的文本特征加权方法tf.rf基础上,提出一种基于有指导的改进文本特征加权方法tf.ridf。该改进方法结合tf.idf和tf.rf 2种方法的特点,考虑词在总体文档及各类别文档之间的关系,实现文本特征加权。实验结果表明,该方法的分类能力比tf.rf方法有明显提升。

关键词: 数据挖掘, 文本分类, 文本表示, 特征加权, 有指导方法, 支持向量机

Abstract: The traditional tf.idf algorithm can not take full advantage of the characteristics of data set for classification, which can not reflect the relationship of the term among the classes. On the basis of analyzing the tf.rf which is a supervised text feature weighting method, this paper proposes an improved supervised algorithm of text feature weighting which is called tf.ridf. The algorithm combines the advantages of two ideas, considers the relationship of the term in the various categories and in the overall documents, implements text feature weighting. Experimental result shows that the classification accuracy of tf.ridf increases significantly higher than tf.rf.

Key words: data mining, text categorization, text representation, feature weighting, supervised method, Support Vector Machine(SVM)

中图分类号: