计算机工程 ›› 2010, Vol. 36 ›› Issue (10): 231-232.doi: 10.3969/j.issn.1000-3428.2010.10.080

• 人工智能及识别技术 • 上一篇    下一篇

基于上下文的二阶隐马尔可夫模型

刘洁彬1,宋茂强1,赵 方1,杨志宇2   

  1. (1. 北京邮电大学软件学院,北京 100876;2. 北京航天航空大学软件学院,北京 100083)
  • 出版日期:2010-05-20 发布日期:2010-05-20

Second-order Hidden Markov Model Based on Context

LIU Jie-bin1, SONG Mao-qiang1, ZHAO Fang1, YANG Zhi-yu2   

  1. (1. College of Software Engineering, Beijing University of Posts and Telecommunications, Beijing 100876;2. College of Software, Beihang University, Beijing 100083)
  • Online:2010-05-20 Published:2010-05-20

摘要: 为体现上下文信息对当前词汇词性的影响,在传统隐马尔可夫模型的基础上提出一种基于上下文的二阶隐马尔可夫模型,并应用于中文词性标注中。针对改进后的统计模型中由于训练数据过少而出现的数据稀疏问题,给出基于指数线性插值改进平滑算法,对参数进行有效平滑。实验表明,基于上下文的二阶隐马尔可夫模型比传统的隐马尔可夫模型具有更高的词性标注正确率和消歧率。

关键词: 词性标注, 二阶隐马尔可夫模型, 参数平滑, Viterbi算法

Abstract: To better represent the influence of the context to the part of speech of the current word, this paper proposes a second-order hidden Markov model based on the traditional hidden Markov model and applies it to part-of-speech tagging in Chinese. In the improved statistical model, sparse data problem occurs due to the shortage of training data. To solve this problem, an improved smoothing algorithm based on index linear interpolation is proposed, which provides effective smoothing. Experiments show that the second-order Hidden Markov Model(HMM) based on the context has higher correct rate and disambiguation rate of part-of-speech tagging than the traditional hidden Markov model.

Key words: part-of-speech tagging, second-order Hidden Markov Model(HMM), parameter smoothing, Viterbi algorithm

中图分类号: