作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2009, Vol. 35 ›› Issue (10): 14-17. doi: 10.3969/j.issn.1000-3428.2009.10.005

• 博士论文 • 上一篇    下一篇

对称和非对称词语聚类模型的比较研究

孙越恒,曹桂宏,侯越先   

  1. (天津大学计算机科学与技术学院,天津 300072)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-05-20 发布日期:2009-05-20

Comparative Research on Symmetric and Asymmetric Word Clustering Models

SUN Yue-heng, CAO Gui-hong, HOU Yue-xian   

  1. (School of Computer Science and Technology, Tianjin University, Tianjin 300072)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-05-20 Published:2009-05-20

摘要: 词语聚类是语音识别、智能信息检索等领域的一个重要的自然语言处理问题。实现基于互信息的对称聚类模型,并针对该模型未考虑词语顺序的缺陷,提出一种新的非对称聚类模型。按照聚类词相对其他词语的位置关系,该模型分为2个子模型,即条件聚类模型和预测聚类模型。在大规模数据集上的实验表明,相对于对称聚类模型,非对称聚类模型是一种更为有效的词语聚类模型。

关键词: 词语聚类, 对称聚类模型, 非对称聚类模型

Abstract: Word clustering is one of important natural language processing issues in speech recognition and intelligent information retrieval, etc. This paper presents a symmetric clustering model based on mutual information. For the model not taking the order of words into account, it proposes a new asymmetric clustering model including two sub models, conditional clustering model and predictive clustering model. Experimental results on large scale data set show that compared with the symmetric clustering model, the asymmetric clustering model is a more effective one for clustering words.

Key words: word clustering, symmetric clustering model, asymmetric clustering model

中图分类号: