作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2011, Vol. 37 ›› Issue (20): 178-179. doi: 10.3969/j.issn.1000-3428.2011.20.061

• 人工智能及识别技术 • 上一篇    下一篇

基于改进HMM的文本信息抽取模型

梁吉光,田俊华,姜 杰   

  1. (南京师范大学教育科学学院,南京 210000)
  • 收稿日期:2011-04-13 出版日期:2011-10-20 发布日期:2011-10-20
  • 作者简介:梁吉光(1987-),男,硕士研究生,主研方向:文本信息抽取;田俊华,副教授;姜 杰,讲师
  • 基金资助:
    江苏省高校自然科学基础研究基金资助项目(08KJD12 0004);全国教育科学规划德育专项基金资助项目(GEA090005)

Text Information Extraction Model Based on Improved HMM

LIANG Ji-guang, TIAN Jun-hua, JIANG Jie   

  1. (Educational Science College, Nanjing Normal University, Nanjing 210000, China)
  • Received:2011-04-13 Online:2011-10-20 Published:2011-10-20

摘要: 提出一种基于改进隐马尔可夫模型(HMM)的文本信息抽取模型。给出一个新假设,使用绝对平滑算法对模型参数进行平滑,利用Viterbi算法对观察值序列进行正序和逆序解码,基于N-Gram模型对2次解码结果进行对比消歧,得到较准确的状态序列。实验结果表 明,该信息抽取模型能提高信息抽取的准确率。

关键词: 隐马尔可夫模型, 绝对平滑, 观察值, 信息抽取, 引文信息

Abstract: This paper proposes a text information extraction model based on improved Hidden Markov Model(HMM). It gives a new assumption of observation emission. And the absolute smoothing algorithm is used to smooth the parameters of the model. The model recovers the most-likely state sequence of the observation sequence and the reverse observation sequence with the Viterbi algorithm. It compares the results with each other based on N-Gram model, and outputs a more accurate result for the state sequence. Experimental results indicate that this model has effectively improved precision.

Key words: Hidden Markov Model(HMM), absolute smoothing, observation, information extraction, citation information

中图分类号: