作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2012, Vol. 38 ›› Issue (20): 152-155. doi: 10.3969/j.issn.1000-3428.2012.20.039

• 人工智能及识别技术 • 上一篇    下一篇

面向词义消歧的条件随机场模型库构建

车 玲,张仰森   

  1. (北京信息科技大学智能信息处理研究所,北京 100192)
  • 收稿日期:2011-11-07 修回日期:2012-02-05 出版日期:2012-10-20 发布日期:2012-10-17
  • 作者简介:车 玲(1987-),女,硕士,主研方向:智能信息处理;张仰森,教授
  • 基金资助:

    国家自然科学基金资助项目(60873013, 61070119);北京大学计算语言学教育部重点实验室开放课题基金资助项目(KLCL-1005);北京市属市管高等学校人才强教计划基金资助项目(PHR201007131)

Construction of Condition Random Field Model Base Oriented to Word Sense Disambiguation

CHE Ling, ZHANG Yang-sen   

  1. (Institute of Intelligence Information Processing, Beijing Information Science and Technology University, Beijing 100192, China)
  • Received:2011-11-07 Revised:2012-02-05 Online:2012-10-20 Published:2012-10-17

摘要: 以条件随机场(CRF)作为构建词义消歧模型库的概率模型,利用CRF分别训练高频义项和低频义项标点句语料,应用生成的模型文件进行消歧实验。通过分析标注结果中的概率值确定阈值,以区分标注正确项和错误项。使用表现较好的模型文件和相应阈值构建面向词义消歧的条件随机场模型库。实验结果证明,对低频义项建模的词义消歧效果优于对高频义项进行建模,可以达到80%以上的正确率,并且可以获得较高的召回率。

关键词: 多义词, 词义消歧, 条件随机场, 高频义项, 低频义项

Abstract: Taking Condition Random Field(CRF) method as probability model for word sense disambiguation model base, this paper uses CRF to train model file of high-frequency meaning and low-frequency meaning. It analyzes the probabilities in the result, determines a threshold to justify whether the marked tag is right, and uses the best performed model and its corresponding threshold to build CRF model base. Experimental results prove that model files of low-frequency meaning have better performance. The accuracy rate can reach above 80%, and the call rate is high.

Key words: polysemous word, word sense disambiguation, Condition Random Field(CRF), high-frequency meaning, low-frequency meaning

中图分类号: