摘要: 为更好地对未标记文本进行分类,通过定义文本和类别的隶属函数,将测试文本和类别表示为特征的模糊集,计算模糊集之间的相关系数并用来度量测试文本到每个类别的隶属度,根据最大隶属度原则确定测试文本所属类别。实验结果表明,与k-NN算法相比,该方法有较好的准确率,分类速度有较大提高。
关键词:
文本分类,
隶属函数,
模糊关系,
相关系数,
隶属度
Abstract: To better classify the unlabeled text, this paper presents a text classification approach based on fuzzy relationship. Through defining membership function of test document relationship and test category relationship, the test document and category can be represented as fuzzy sets. Then evaluating the membership degree between test document and each category by computing the correlation coefficient of fuzzy sets, the test document is decided to category using maximum membership principle. Compared with k-NN, experimental result shows that the precision is increased and the classification process is speeded up to a considerable degree.
Key words:
text classification,
membership function,
fuzzy relationship,
correlation coefficient,
membership degree
中图分类号:
张玉芳, 娄娟, 李智星, 熊忠阳. 基于模糊关系的文本分类方法[J]. 计算机工程, 2011, 37(16): 149-151.
ZHANG Yu-Fang, LOU Juan, LI Zhi-Xing, XIONG Zhong-Yang. Text Classification Approach Based on Fuzzy Relationship[J]. Computer Engineering, 2011, 37(16): 149-151.