Web Information Extraction Based on Sub-tree Breadth

doi:10.3969/j.issn.1000-3428.2009.03.031

Computer Engineering ›› 2009, Vol. 35 ›› Issue (3): 89-90,9. doi: 10.3969/j.issn.1000-3428.2009.03.031

• Software Technology and Database • Previous Articles Next Articles

Web Information Extraction Based on Sub-tree Breadth

WANG Quan, SHI Shao-ting

(Institute of Science & Technology Information of Gansu, Lanzhou 730000)

Received:1900-01-01 Revised:1900-01-01 Online:2009-02-05 Published:2009-02-05

基于子树广度的Web信息抽取

王权，施韶亭

(甘肃省科学技术情报研究所，兰州 730000)

Abstract

Abstract: This paper proposes a new method which can extract the useful information from the different document sites automatically based on the breadth of a sub-tree. Experimental evaluation on a large of Web pages from different document Web sites has done and this method has been applied to the platform of gansu science & technology document sharing successfully. Experimental result shows this method automatically extracts the information ignoring where Web sites the pages come from and has high accuracy in terms of recall and precision．

Key words: sub-tree breadth, information extraction, cross-search

摘要： 提出一种新的网页信息抽取方法，基于子树的广度可不加区分地对不同科技文献网站的页面信息进行自动抽取。对大量科技文献网站进行信息抽取实验，已应用到甘肃省科技文献共享平台。实验结果证明，该方法能不依赖科技文献网页的来源而自动地抽取相关信息，并能保证较高的数据抽取回召率和查准率。

关键词: 子树广度, 信息抽取, 跨库检索

CLC Number:

TP393

WANG Quan; SHI Shao-ting. Web Information Extraction Based on Sub-tree Breadth[J]. Computer Engineering, 2009, 35(3): 89-90,9.

王权;施韶亭. 基于子树广度的Web信息抽取[J]. 计算机工程, 2009, 35(3): 89-90,9.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.3969/j.issn.1000-3428.2009.03.031

http://www.ecice06.com/EN/Y2009/V35/I3/89

[1]	HENG Hongjun, MIAO Jing. Joint Extraction of Binary Tagging Entity Relation for Enhanced Semantic and Syntactic Information [J]. Computer Engineering, 2023, 49(4): 77-84.
[2]	ZHANG Wenwen, XU Yang, BAI Rui, CHEN Na. Animal Pose Estimation Based on Improved Stacked Hourglass Network [J]. Computer Engineering, 2023, 49(2): 263-270.
[3]	ZHANG Jixiang, ZHANG Xiangsen, WU Changxu, ZHAO Zengshun. Survey of Knowledge Graph Construction Techniques [J]. Computer Engineering, 2022, 48(3): 23-37.
[4]	ZHANG Junlian, ZHANG Yifan, WANG Mingquan, HUANG Yongjian. Joint Extraction of Chinese Entity Relations Based on Graph Convolutional Neural Network [J]. Computer Engineering, 2021, 47(12): 103-111.
[5]	HE Yangyu, YAN Lei, YI Mianzhu, LI Hongxin. Named Entitiy Recognition Method for Laotian in Military Field Combining CRF and Rules [J]. Computer Engineering, 2020, 46(8): 297-304.
[6]	WANG Wenqi,LI Yong,GUAN Yunyun. Research on Text Information Depth Extraction and Multi-keyword Parallel Matching Technique [J]. Computer Engineering, 2018, 44(12): 281-287.
[7]	LI Yanqun,HE Yunqi,QIAN Longhua,ZHOU Guodong. Automatic Construction of Chinese Nested Named Entity Recognition Corpus Based on Wikipedia [J]. Computer Engineering, 2018, 44(11): 76-82.
[8]	WANG Hui,YU Bo,HONG Yu,XIAO Yanghua. Web Information Extraction System Based on Knowledge Graph [J]. Computer Engineering, 2017, 43(6): 118-124.
[9]	MA Dongdong,ZHONG Lujie,ZHU Jingru. Parallel Extraction of Program Analysis Information on LLVM Based on GPU [J]. Computer Engineering, 2017, 43(10): 23-30.
[10]	YANG Jilian. Slight Change Information Extraction from Hyperspectral Image Sequence Based on MMAE Index [J]. Computer Engineering, 2016, 42(7): 261-266.
[11]	LI Mingyao,YANG Jing. Open Chinese Entity Relation Extraction Method Based on Dependency Parsing [J]. Computer Engineering, 2016, 42(6): 201-207.
[12]	PENG Min,FU Hui,HUANG Jimin,HUANG Jiajia,LIU Jiping. High Quality Microblog Extraction Based on Kernel Principal Component Analysis and Wavelet Transformation [J]. Computer Engineering, 2016, 42(1): 180-186.
[13]	PENG Min,GAO Binlong,HUANG Jimin,LIU Jiping. Automatic Summarization of Microblog Based on High Quality Information Extraction [J]. Computer Engineering, 2015, 41(7): 36-42.
[14]	WU Xiaofang,YANG Zhihao,LIN Hongfei,WANG Jian. Disease Knowledge Extraction System Based on Semantic Relation [J]. Computer Engineering, 2015, 41(1): 284-288.
[15]	CHU Yan-jie,WEI Qiang,LI Yu-zhao. Event Detection Based on Keyword Semantics and Scope Extension [J]. Computer Engineering, 2014, 40(8): 273-276,281.

Please choose a citation manager

Content to export

Web Information Extraction Based on Sub-tree Breadth

基于子树广度的Web信息抽取

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

Web Information Extraction Based on Sub-tree Breadth

基于子树广度的Web信息抽取

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments