作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2009, Vol. 35 ›› Issue (10): 201-202. doi: 10.3969/j.issn.1000-3428.2009.10.066

• 人工智能及识别技术 • 上一篇    下一篇

基于语义的高质量中文短信文本聚类算法

刘金岭   

  1. (淮阴工学院计算机工程系,淮安 223003)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-05-20 发布日期:2009-05-20

High Quality Algorithm for Chinese Short Messages Text Clustering Based on Semantic

LIU Jin-ling   

  1. (Department of Computer, Huaiyin Institute of Technology, Huaian 223003)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-05-20 Published:2009-05-20

摘要: 现有数据聚类方法在处理文本数据时,没有考虑词之间潜在的相似信息,导致聚类效果不理想。针对中文短信文本聚类提出一种基于语义的聚类算法。给出中文概念、词和中文短信文本的相似度度量方法,通过向下连锁裂变和向上两两归并完成中文短信文本聚类。实验结果表明,该算法的聚类质量高于传统算法。

关键词: 短信文本, 语义, 概念相似度

Abstract: Existing data clustering method lacks considering of latent similar information existing among words, and it leads to unsatisfactory clustering result. Aiming at Chinese short message text clustering, this paper proposes a clustering algorithm based on semantic. It offers Chinese concept, and the measuring methods to calculate the similarity degree about words and Chinese short message text. It completes the clustering of Chinese short messages text through fission downwards and mergence of twos upwards. Experimental results show that this algorithm has better clustering quality than traditional algorithm.

Key words: short messages text, semantic, concept similarity

中图分类号: