Abstract:
A novel approach is presented. The main defect of traditional methods of fuzzy clustering is to known the number of clustering in advance. This paper applies the dynamic self-organizing maps algorithm to determining the number of clustering. The text eigenvector is acquired based on the vector space model(VSM) and TF?IDF method. The result of clustering is attained by fuzzy C mean algorithm (FCM). The number of clustering acquired by the dynamic self-organizing maps is introduced into the fuzzy C mean algorithm (FCM). Compared to the dynamic self-organizing maps algorithm, the present algorithm possesses higher precision. The fuzzy clustering is suitable for dealing with the semantic variety and complexity. The example demonstrates the effectiveness of the present algorithm.
Key words:
Self-organizing maps; Text eigenvector; Fuzzy clustering; Number of clustering
摘要: 提出一种新的动态模糊聚类的方法,针对传统的模糊聚类需要预先确定聚类数的问题,提出采用动态自组织映射神经网络来确定聚类数,并通过文本向量空间模型和TF?IDF 方法来确定文本的特征向量,再将动态自组织映射神经网络得到的聚类数,用模糊C 均值算法(FCM)函数处理,得到聚类的结果。该算法同仅用动态自组织映射神经网络算法的运行结果相比,具有运行聚类结果精度高的优点,模糊聚类更适合处理语义的多样性和文本归属的模糊性,实验验证了算法的有效性。
关键词:
自组织映射网络;文本特征向量;模糊聚类;聚类数
GENG Xinqing, WANG Zheng’ou. TGFCM: A Novel Approach of Chinese Text Mining Based on Fuzzy Clustering[J]. Computer Engineering, 2006, 32(5): 7-9.
耿新青,王正欧. TGFCM:基于模糊聚类的中文文本挖掘的新方法[J]. 计算机工程, 2006, 32(5): 7-9.