Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2012, Vol. 38 ›› Issue (20): 152-155. doi: 10.3969/j.issn.1000-3428.2012.20.039

• Networks and Communications • Previous Articles     Next Articles

Construction of Condition Random Field Model Base Oriented to Word Sense Disambiguation

CHE Ling, ZHANG Yang-sen   

  1. (Institute of Intelligence Information Processing, Beijing Information Science and Technology University, Beijing 100192, China)
  • Received:2011-11-07 Revised:2012-02-05 Online:2012-10-20 Published:2012-10-17

面向词义消歧的条件随机场模型库构建

车 玲,张仰森   

  1. (北京信息科技大学智能信息处理研究所,北京 100192)
  • 作者简介:车 玲(1987-),女,硕士,主研方向:智能信息处理;张仰森,教授
  • 基金资助:

    国家自然科学基金资助项目(60873013, 61070119);北京大学计算语言学教育部重点实验室开放课题基金资助项目(KLCL-1005);北京市属市管高等学校人才强教计划基金资助项目(PHR201007131)

Abstract: Taking Condition Random Field(CRF) method as probability model for word sense disambiguation model base, this paper uses CRF to train model file of high-frequency meaning and low-frequency meaning. It analyzes the probabilities in the result, determines a threshold to justify whether the marked tag is right, and uses the best performed model and its corresponding threshold to build CRF model base. Experimental results prove that model files of low-frequency meaning have better performance. The accuracy rate can reach above 80%, and the call rate is high.

Key words: polysemous word, word sense disambiguation, Condition Random Field(CRF), high-frequency meaning, low-frequency meaning

摘要: 以条件随机场(CRF)作为构建词义消歧模型库的概率模型,利用CRF分别训练高频义项和低频义项标点句语料,应用生成的模型文件进行消歧实验。通过分析标注结果中的概率值确定阈值,以区分标注正确项和错误项。使用表现较好的模型文件和相应阈值构建面向词义消歧的条件随机场模型库。实验结果证明,对低频义项建模的词义消歧效果优于对高频义项进行建模,可以达到80%以上的正确率,并且可以获得较高的召回率。

关键词: 多义词, 词义消歧, 条件随机场, 高频义项, 低频义项

CLC Number: