摘要: 为体现上下文信息对当前词汇词性的影响,在传统隐马尔可夫模型的基础上提出一种基于上下文的二阶隐马尔可夫模型,并应用于中文词性标注中。针对改进后的统计模型中由于训练数据过少而出现的数据稀疏问题,给出基于指数线性插值改进平滑算法,对参数进行有效平滑。实验表明,基于上下文的二阶隐马尔可夫模型比传统的隐马尔可夫模型具有更高的词性标注正确率和消歧率。
关键词:
词性标注,
二阶隐马尔可夫模型,
参数平滑,
Viterbi算法
Abstract: To better represent the influence of the context to the part of speech of the current word, this paper proposes a second-order hidden Markov model based on the traditional hidden Markov model and applies it to part-of-speech tagging in Chinese. In the improved statistical model, sparse data problem occurs due to the shortage of training data. To solve this problem, an improved smoothing algorithm based on index linear interpolation is proposed, which provides effective smoothing. Experiments show that the second-order Hidden Markov Model(HMM) based on the context has higher correct rate and disambiguation rate of part-of-speech tagging than the traditional hidden Markov model.
Key words:
part-of-speech tagging,
second-order Hidden Markov Model(HMM),
parameter smoothing,
Viterbi algorithm
中图分类号:
刘洁彬, 宋茂强, 赵方, 杨志宇. 基于上下文的二阶隐马尔可夫模型[J]. 计算机工程, 2010, 36(10): 231-232.
LIU Ji-Ban, SONG Mao-Jiang, DIAO Fang, YANG Zhi-Yu. Second-order Hidden Markov Model Based on Context[J]. Computer Engineering, 2010, 36(10): 231-232.