作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2009, Vol. 35 ›› Issue (18): 15-18. doi: 10.3969/j.issn.1000-3428.2009.18.006

• 博士论文 • 上一篇    下一篇

基于最大熵模型的汉语词义消歧与标注方法

张仰森1,2   

  1. (1. 北京信息科技大学智能信息处理研究所,北京 100192;2. 中国科学院自动化所模式识别国家重点实验室,北京 100080)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-09-20 发布日期:2009-09-20

Approach to Chinese Word Sense Disambiguation and Tagging Based on Maximum Entropy Models

ZHANG Yang-sen1,2   

  1. (1. Institute of Intelligent Information Processing, Beijing Information Science & Technology University, Beijing 100192; 2. National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academic of Sciences, Beijing 100080)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-09-20 Published:2009-09-20

摘要: 分析最大熵模型开源代码的原理和各参数的意义,采用频次和平均互信息相结合特征筛选和过滤方法,用Delphi语言编程实现汉语词义消歧的最大熵模型,运用GIS(Generalized Iterative Scaling)算法计算模型的参数。结合一些语言知识规则解决训练语料的数据稀疏问题,所实现的汉语词义消歧与标注系统,对800多个多义词进行词义标注,取得了较好的标注正确率。

关键词: 词义消歧与标注, 最大熵模型, 上下文特征, 特征筛选

Abstract: This paper analyzes the principle and every parameter meaning of open-source code of maximum entropy models, uses the method of the combination of feature frequency and average mutual information to select the features from the candidate feature set, realizes the maximum entropy models for Chinese Word Sense Disambiguation(WSD) by Delphi, and computes models parameters by GIS algorithm. It solves the data sparseness problem by combining the linguistic knowledge. The system for Chinese word sense automatic disambiguation and tagging is implemented. It uses the system to tag word sense of more than 800 multivocal words, and achieves the better correcte rate.

Key words: Word Sense Disambiguation(WSD) and tagging, maximum entropy models, contextual features, feature selecting

中图分类号: