摘要: 提出一种基于改进隐马尔可夫模型(HMM)的文本信息抽取模型。给出一个新假设,使用绝对平滑算法对模型参数进行平滑,利用Viterbi算法对观察值序列进行正序和逆序解码,基于N-Gram模型对2次解码结果进行对比消歧,得到较准确的状态序列。实验结果表 明,该信息抽取模型能提高信息抽取的准确率。
关键词:
隐马尔可夫模型,
绝对平滑,
观察值,
信息抽取,
引文信息
Abstract: This paper proposes a text information extraction model based on improved Hidden Markov Model(HMM). It gives a new assumption of observation emission. And the absolute smoothing algorithm is used to smooth the parameters of the model. The model recovers the most-likely state sequence of the observation sequence and the reverse observation sequence with the Viterbi algorithm. It compares the results with each other based on N-Gram model, and outputs a more accurate result for the state sequence. Experimental results indicate that this model has effectively improved precision.
Key words:
Hidden Markov Model(HMM),
absolute smoothing,
observation,
information extraction,
citation information
中图分类号:
梁吉光, 田俊华, 姜杰. 基于改进HMM的文本信息抽取模型[J]. 计算机工程, 2011, 37(20): 178-179.
LIANG Ji-Guang, TIAN Dun-Hua, JIANG Jie. Text Information Extraction Model Based on Improved HMM[J]. Computer Engineering, 2011, 37(20): 178-179.