Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2006, Vol. 32 ›› Issue (2): 203-205.

• Artificial Intelligence and Recognition Technology • Previous Articles     Next Articles

Information Extraction Algorithm Based on Multiple Templates Using Hidden Markov Model

ZHONG Minjuan1, HAO Qian2, LIU Yunzhong3   

  1. 1. College of Information Technology, Jiangxi University of Finance and Economy, Nanchang 330013;2. Department of Mathematics and Computer Science, Jiangxi Science & Technology Normal University, Nanchang 330013;3. Department of CDMA, ZTE Corporation, Shenzhen 300457
  • Online:2006-01-20 Published:2006-01-20

基于多模板隐马尔可夫模型的文本信息抽取算法

钟敏娟1,郝谦 2,刘云中3   

  1. 1. 江西财经大学信息管理学院,南昌 330013;2. 江西科技师范学院数学与计算机科学系,南昌 330013;3. 深圳中兴通讯公司CDMA 事业部,深圳 300457

Abstract: This paper proposes a new algorithm using hidden Markov model for information extraction based on multiple templates due to the variety of training data. This new algorithm firstly clusters the training data into multiple templates based on the format, and then combines hidden Markov model for information extraction. The experiment results show that the new algorithm outperforms the original one, which hasn’t clustered the training data into multiple templates, in both recall and precision

Key words: Information extraction; Hidden Markov model; Multiple templates; Clustering

摘要: 针对训练数据来源的多样化,提出了基于多模板隐马尔可夫模型的文本信息抽取算法。该算法利用形式的聚类方法将训练数据聚成几个类,每个类代表一个模板,在聚类的基础上利用隐马尔可夫模型进行文本的信息抽取。实验结果表明,新算法具有较高的精确度和召回率。

关键词: 信息抽取;隐马尔可夫模型;多模板;聚类