摘要: 提出了一种基于概念特征向量的NB文档分类方法。该方法在未标注文档集上通过SOM(Self-Organizing Maps)聚类产生若干初始文档类,并为每个文档类分配一个类标签,使用最大信息熵的方法建立每个文档类的概念特征向量。在概念特征向量空间上建立最终的文档分类器:CFB-NB。
关键词:
文档分类,
概念特征向量,
NB分类器
Abstract: This paper proposes a novel Naïve-Bayes document classification method based on the set of concept feature vectors. It produces some initial classes from the set of unlabeled Web documents by SOM clustering and distributes a label for each, and builds the corresponding concept feature vector for each initial class using the maximum entropy method. It builds the last CFV-NB document classifier based on the space of concept feature vectors.
Key words:
Document classification,
Concept feature vectors,
NB classifier
中图分类号:
何 丽;刘 军. CFV-NB:基于概念特征向量的NB文档分类模型 [J]. 计算机工程, 2006, 32(20): 4-6.
HE Li; LIU Jun. CFV-NB: Naïve-Bayes Documents Classification Model Based on Concept Feature Vectors[J]. Computer Engineering, 2006, 32(20): 4-6.