作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2006, Vol. 32 ›› Issue (2): 203-205.

• 人工智能及识别技术 • 上一篇    下一篇

基于多模板隐马尔可夫模型的文本信息抽取算法

钟敏娟1,郝谦 2,刘云中3   

  1. 1. 江西财经大学信息管理学院,南昌 330013;2. 江西科技师范学院数学与计算机科学系,南昌 330013;3. 深圳中兴通讯公司CDMA 事业部,深圳 300457
  • 出版日期:2006-01-20 发布日期:2006-01-20

Information Extraction Algorithm Based on Multiple Templates Using Hidden Markov Model

ZHONG Minjuan1, HAO Qian2, LIU Yunzhong3   

  1. 1. College of Information Technology, Jiangxi University of Finance and Economy, Nanchang 330013;2. Department of Mathematics and Computer Science, Jiangxi Science & Technology Normal University, Nanchang 330013;3. Department of CDMA, ZTE Corporation, Shenzhen 300457
  • Online:2006-01-20 Published:2006-01-20

摘要: 针对训练数据来源的多样化,提出了基于多模板隐马尔可夫模型的文本信息抽取算法。该算法利用形式的聚类方法将训练数据聚成几个类,每个类代表一个模板,在聚类的基础上利用隐马尔可夫模型进行文本的信息抽取。实验结果表明,新算法具有较高的精确度和召回率。

关键词: 信息抽取;隐马尔可夫模型;多模板;聚类

Abstract: This paper proposes a new algorithm using hidden Markov model for information extraction based on multiple templates due to the variety of training data. This new algorithm firstly clusters the training data into multiple templates based on the format, and then combines hidden Markov model for information extraction. The experiment results show that the new algorithm outperforms the original one, which hasn’t clustered the training data into multiple templates, in both recall and precision

Key words: Information extraction; Hidden Markov model; Multiple templates; Clustering