Information Extraction from Chinese Research Papers Based on Hidden Markov Model

doi:10.3969/j.issn.1000-3428.2007.19.067

Computer Engineering ›› 2007, Vol. 33 ›› Issue (19): 190-192. doi: 10.3969/j.issn.1000-3428.2007.19.067

• Artificial Intelligence and Recognition Technology • Previous Articles Next Articles

Information Extraction from Chinese Research Papers Based on Hidden Markov Model

YU Jiang-de1,2, FAN Xiao-zhong1, YIN Ji-hao1, GU Yi-jun1

（1. School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081; 2. Department of Computer Science, Anyang Normal College, Anyang 455000）

Received:1900-01-01 Revised:1900-01-01 Online:2007-10-05 Published:2007-10-05

基于隐马尔可夫模型的中文科研论文信息抽取

于江德1,2，樊孝忠1，尹继豪1，顾益军1

（1. 北京理工大学计算机科学技术学院，北京 100081；2. 安阳师范学院计算机科学系，安阳 455000）

Abstract

Abstract: As many research papers appear on the Internet, it is very important to accurately extract paper header information and citation from these papers. Thispaper proposes an algorithm based on hidden Markov model for extracting paper header information and citation from Chinese research papers, analyzes the key to the learning of the module structure and method of parameter estimation. In the processing, the algorithm makes full use of the format information of list separators and special-labels to segment text, and gains extraction information of special-fields, based on hidden Morkov model. Experimental results show that the algorithm has good performance in precision and recall.

Key words: hidden Markov model, information extraction, paper header information

摘要： 随着大量的科研论文出现在互联网上，从中精确地抽取论文头部信息和引文信息显得十分重要。该文提出了一种基于隐马尔可夫模型的中文科研论文头部信息和引文信息抽取算法，分析了模型结构的学习和参数估计方法。在进行信息抽取时，利用分隔符、特定标识符等格式信息对文本进行分块，利用隐马尔可夫模型进行指定域的抽取。实验结果表明，该算法具有良好的准确率和召回率。

关键词: 隐马尔可夫模型, 信息抽取, 论文头部信息

CLC Number:

TP391

YU Jiang-de; FAN Xiao-zhong; YIN Ji-hao; GU Yi-jun. Information Extraction from Chinese Research Papers Based on Hidden Markov Model[J]. Computer Engineering, 2007, 33(19): 190-192.

于江德;樊孝忠;尹继豪;顾益军. 基于隐马尔可夫模型的中文科研论文信息抽取[J]. 计算机工程, 2007, 33(19): 190-192.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.3969/j.issn.1000-3428.2007.19.067

http://www.ecice06.com/EN/Y2007/V33/I19/190

[1]	HENG Hongjun, MIAO Jing. Joint Extraction of Binary Tagging Entity Relation for Enhanced Semantic and Syntactic Information [J]. Computer Engineering, 2023, 49(4): 77-84.
[2]	ZHANG Wenwen, XU Yang, BAI Rui, CHEN Na. Animal Pose Estimation Based on Improved Stacked Hourglass Network [J]. Computer Engineering, 2023, 49(2): 263-270.
[3]	ZHANG Jixiang, ZHANG Xiangsen, WU Changxu, ZHAO Zengshun. Survey of Knowledge Graph Construction Techniques [J]. Computer Engineering, 2022, 48(3): 23-37.
[4]	ZHANG Junlian, ZHANG Yifan, WANG Mingquan, HUANG Yongjian. Joint Extraction of Chinese Entity Relations Based on Graph Convolutional Neural Network [J]. Computer Engineering, 2021, 47(12): 103-111.
[5]	HE Yangyu, YAN Lei, YI Mianzhu, LI Hongxin. Named Entitiy Recognition Method for Laotian in Military Field Combining CRF and Rules [J]. Computer Engineering, 2020, 46(8): 297-304.
[6]	SUN Zhongjun, ZHAI Jiangtao. A Network Application Identification Method for Encrypted Traffic [J]. Computer Engineering, 2020, 46(4): 151-156.
[7]	ZHENG Wenxiu, ZHAO Junyi, WEN Xinyi, YAO Yindi. Acoustic Model Construction Method Based on Bottleneck Compound Feature [J]. Computer Engineering, 2020, 46(11): 301-305,314.
[8]	BAI Lingling, NING Zhenhu, XUE Fei, YANG Yongli. Application of Hidden Markov Model in Malicious Domain Name Detection [J]. Computer Engineering, 2019, 45(9): 161-168.
[9]	HUANG Juanjuan,XU Yuan,ZHU Qunxiong. 3D map matching algorithm for scenic spot based on improved hidden Markov model [J]. Computer Engineering, 2019, 45(6): 259-266.
[10]	WU Jianwei,LI Yanling,ZANG Hanlin. Cognitive Network Throughput Optimization Method Based on Improved Frame Structure [J]. Computer Engineering, 2018, 44(6): 45-49.
[11]	WANG Wenqi,LI Yong,GUAN Yunyun. Research on Text Information Depth Extraction and Multi-keyword Parallel Matching Technique [J]. Computer Engineering, 2018, 44(12): 281-287.
[12]	LI Yanqun,HE Yunqi,QIAN Longhua,ZHOU Guodong. Automatic Construction of Chinese Nested Named Entity Recognition Corpus Based on Wikipedia [J]. Computer Engineering, 2018, 44(11): 76-82.
[13]	HU Zhilong,WEN Chang,XIE Kai,HE Jianbiao. Voiceprint Password Recognition Algorithm Fusing with HMM-UBM and RVM [J]. Computer Engineering, 2018, 44(11): 129-134.
[14]	LIU Bo,DU Jianqiang,NIE Bin,LIU Lei,ZHANG Xin,HAO Zhulin. Part-of-speech Tagging of Traditional Chinese Medicine Diagnosis Ancient Prose Based on Second-order HMM [J]. Computer Engineering, 2017, 43(7): 211-216.
[15]	WANG Hui,YU Bo,HONG Yu,XIAO Yanghua. Web Information Extraction System Based on Knowledge Graph [J]. Computer Engineering, 2017, 43(6): 118-124.

Please choose a citation manager

Content to export

Information Extraction from Chinese Research Papers Based on Hidden Markov Model

基于隐马尔可夫模型的中文科研论文信息抽取

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

Information Extraction from Chinese Research Papers Based on Hidden Markov Model

基于隐马尔可夫模型的中文科研论文信息抽取

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments