作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2012, Vol. 38 ›› Issue (01): 195-196,210. doi: 10.3969/j.issn.1000-3428.2012.01.062

• 人工智能及识别技术 • 上一篇    下一篇

基于特征选择的质心向量构建方法

谢 华,王 健,林鸿飞,杨志豪   

  1. (大连理工大学计算机科学与技术学院,辽宁 大连 116024)
  • 收稿日期:2011-05-30 出版日期:2012-01-05 发布日期:2012-01-05
  • 作者简介:谢 华(1984-),男,硕士研究生,主研方向:文本分类;王 健,副教授;林鸿飞,教授、博士、博士生导师;杨志豪, 副教授、博士
  • 基金资助:
    国家自然科学基金资助项目(60673039, 60973068);国家“863”计划基金资助项目(2006AA01Z151);高等学校博士学科点专项科研基金资助项目(20090041110002)

Centroid Vector Construction Method Based on Feature Selection

XIE Hua, WANG Jian, LIN Hong-fei, YANG Zhi-hao   

  1. (School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China)
  • Received:2011-05-30 Online:2012-01-05 Published:2012-01-05

摘要: 基于质心的文本分类方法对模型较敏感,分类性能较差。为此,提出一种基于特征选择的类别质心向量构建方法FSCC。计算特征与类别之间的特征选择值,利用质心特征权重计算公式得到类别的质心向量,并采用非归一化的余弦相似度计算文档与质心间的距离,实现文本分类。实验结果表明,与基于质心的方法和支持向量机方法相比,FSCC方法的分类效果更好。

关键词: 特征选择, 特征权重, 余弦相似度, 质心, 文本分类

Abstract: Text categorization method based on centroid shows poor performance. This paper proposes a centroid vector construction method based on feature selection named FSCC. By computing feature selection value between features and categories, the centroid vector are calculateed by the formula of centroid feature weight. Finally, a non-normalized cosine similarity measure is employed to calculate the similarity score between a text vector and a centroid. Experimental result show that FSCC significantly outperforms traditional centroid-based methods and state-of-the-art Support Vector Machine(SVM).

Key words: feature selection, feature weight, cosine similarity, centroid, text classification

中图分类号: