作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (19): 190-192. doi: 10.3969/j.issn.1000-3428.2007.19.067

• 人工智能及识别技术 • 上一篇    下一篇

基于隐马尔可夫模型的中文科研论文信息抽取

于江德1,2,樊孝忠1,尹继豪1,顾益军1   

  1. (1. 北京理工大学计算机科学技术学院,北京 100081;2. 安阳师范学院计算机科学系,安阳 455000)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-10-05 发布日期:2007-10-05

Information Extraction from Chinese Research Papers Based on Hidden Markov Model

YU Jiang-de1,2, FAN Xiao-zhong1, YIN Ji-hao1, GU Yi-jun1   

  1. (1. School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081; 2. Department of Computer Science, Anyang Normal College, Anyang 455000)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-10-05 Published:2007-10-05

摘要: 随着大量的科研论文出现在互联网上,从中精确地抽取论文头部信息和引文信息显得十分重要。该文提出了一种基于隐马尔可夫模型的中文科研论文头部信息和引文信息抽取算法,分析了模型结构的学习和参数估计方法。在进行信息抽取时,利用分隔符、特定标识符等格式信息对文本进行分块,利用隐马尔可夫模型进行指定域的抽取。实验结果表明,该算法具有良好的准确率和召回率。

关键词: 隐马尔可夫模型, 信息抽取, 论文头部信息

Abstract: As many research papers appear on the Internet, it is very important to accurately extract paper header information and citation from these papers. Thispaper proposes an algorithm based on hidden Markov model for extracting paper header information and citation from Chinese research papers, analyzes the key to the learning of the module structure and method of parameter estimation. In the processing, the algorithm makes full use of the format information of list separators and special-labels to segment text, and gains extraction information of special-fields, based on hidden Morkov model. Experimental results show that the algorithm has good performance in precision and recall.

Key words: hidden Markov model, information extraction, paper header information

中图分类号: