作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2006, Vol. 32 ›› Issue (20): 4-6. doi: 10.3969/j.issn.1000-3428.2006.20.002

• 博士论文 • 上一篇    下一篇

CFV-NB:基于概念特征向量的NB文档分类模型 

何 丽1,2,刘 军2   

  1. (1. 天津大学管理学院,天津 300072;2. 天津财经大学理工学院,天津 300222)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2006-10-20 发布日期:2006-10-20

CFV-NB: Naïve-Bayes Documents Classification Model Based on Concept Feature Vectors

HE Li1,2, LIU Jun2   

  1. (1. College of Management, Tianjin University, Tianjin 300072; 2. College of Technology, Tianjin University of Finance and Economics, Tianjin 300222)
  • Received:1900-01-01 Revised:1900-01-01 Online:2006-10-20 Published:2006-10-20

摘要: 提出了一种基于概念特征向量的NB文档分类方法。该方法在未标注文档集上通过SOM(Self-Organizing Maps)聚类产生若干初始文档类,并为每个文档类分配一个类标签,使用最大信息熵的方法建立每个文档类的概念特征向量。在概念特征向量空间上建立最终的文档分类器:CFB-NB。

关键词: 文档分类, 概念特征向量, NB分类器

Abstract: This paper proposes a novel Naïve-Bayes document classification method based on the set of concept feature vectors. It produces some initial classes from the set of unlabeled Web documents by SOM clustering and distributes a label for each, and builds the corresponding concept feature vector for each initial class using the maximum entropy method. It builds the last CFV-NB document classifier based on the space of concept feature vectors.

Key words: Document classification, Concept feature vectors, NB classifier

中图分类号: