Dynamical Data Regions Identification and Extraction in Web Pages

doi:10.3969/j.issn.1000-3428.2007.11.020

Computer Engineering ›› 2007, Vol. 33 ›› Issue (11): 53-55,5. doi: 10.3969/j.issn.1000-3428.2007.11.020

• Software Technology and Database • Previous Articles Next Articles

Dynamical Data Regions Identification and Extraction in Web Pages

HUANG Jianbin1,2, JI Hongbing1, SUN Heli3

(1. School of Electronic Engineering, Xidian University, Xi’an 710071; 2. School of Computer Science, Xidian University, Xi’an 710071; 3. Department of Computer Science & Technology, Xi’an Jiaotong University, Xi’an 710049)

Received:1900-01-01 Revised:1900-01-01 Online:2007-06-05 Published:2007-06-05

Web网页中动态数据区域的识别与抽取

黄健斌1,2，姬红兵1，孙鹤立3

(1. 西安电子科技大学电子工程学院，西安 710071；2. 西安电子科技大学计算机学院，西安 710071； 3. 西安交通大学计算机科学与技术系，西安 710049)

Abstract

Abstract: This paper presents an improved approach for finding data blocks in the HTML tag tree to mine the data regions embedded in a Web page. A policy of combining the Web page clustering and cross-page data region analysis is proposed to identify the dynamical Web data regions. Experimental results show the effectiveness of given approach.

Key words: Web data regions extraction, Dynamical data regions identification, Cross-page analysis

摘要： 采用基于HTML标记树的数据块查找方法挖掘Web网页中的数据区域，在此基础上结合网页聚类和跨网页数据区域匹配自动识别一个网页中的动态数据区域。实验结果表明，该方法能够提高Web网页中动态数据区域识别的召回率和准确率。

关键词: Web数据区域抽取, 动态数据区域识别, 跨网页分析

CLC Number:

TP311

HUANG Jianbin; JI Hongbing; SUN Heli. Dynamical Data Regions Identification and Extraction in Web Pages[J]. Computer Engineering, 2007, 33(11): 53-55,5.

黄健斌;姬红兵;孙鹤立. Web网页中动态数据区域的识别与抽取[J]. 计算机工程, 2007, 33(11): 53-55,5.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.3969/j.issn.1000-3428.2007.11.020

http://www.ecice06.com/EN/Y2007/V33/I11/53

[1]	WANG Ziheng, JIANG Zhongding. Enhanced Desktop Display System Supporting Multi-Modal Interaction [J]. Computer Engineering, 2022, 48(7): 177-188.
[2]	XU Fu, HAO Liang, CHEN Feixiang, LI Dongmei, CUI Xiaohui. Program Comparison Analysis Method for Open Source Code Reuse [J]. Computer Engineering, 2020, 46(1): 222-228,242.
[3]	ZHANG Wanying, CAO Xiaomei, CHEN Wei. Solution for Environment Interaction Problem in Whitebox Fuzz Testing [J]. Computer Engineering, 2020, 46(1): 216-221.
[4]	ZHANG Shutao, TAN Haibo, CHEN Liangfeng, Lü Bo. An Efficient Load Balance Strategy for Distributed Crawler System [J]. Computer Engineering, 2019, 45(11): 62-67.
[5]	YANG Zhenglong, GAO Jianhua. User-oriented Performance Analysis of Search Engine Based on Metamorphic Test [J]. Computer Engineering, 2019, 45(10): 52-56,63.
[6]	WANG Chenxu, WANG Xiaochen, YU Dunhui, WU Shan. Software Crowdsourcing Task Decomposition Algorithm Based on Dynamic Decoupling [J]. Computer Engineering, 2019, 45(8): 120-124,134.
[7]	WANG Chenxu, YU Dunhui, ZHANG Wanshan, ZHANG Xingsheng. Module Allocation Algorithm for Software Crowdsourcing Based on Core Degree Sorting [J]. Computer Engineering, 2019, 45(7): 66-70.
[8]	NI Hong, LIU Xin. Multi-Core optimization technology of unstructured grid based on Sunway TaihuLight [J]. Computer Engineering, 2019, 45(6): 45-51.
[9]	QIAN Xuezhong,YAO Linya. Extended incremental fuzzy clustering algorithm for sparse high-dimensional big data [J]. Computer Engineering, 2019, 45(6): 75-81.
[10]	WANG Jing,ZHANG Yunquan,LIANG Jun. Vector algorithm library implementation and optimization based on ARM V8 platform [J]. Computer Engineering, 2019, 45(6): 82-88.
[11]	ZHANG Ya. Design of fault injection platform for integrated shipboard network system [J]. Computer Engineering, 2019, 45(6): 273-279.
[12]	DENG Lujia,LIU Pingshan. Research on Click-Through Rate Prediction of Advertisement Based on GMM-FMs [J]. Computer Engineering, 2019, 45(5): 122-126.
[13]	ZHAO Linliu,LV Xin,TAO Feifei. Efficient Algorithm for High Utility Pattern Mining Based on Top-k [J]. Computer Engineering, 2019, 45(5): 169-174,181.
[14]	Hongnan TAN,Jingyan SHI,Jiaheng ZOU,Ran DU,Xiaowei JIANG,Zhenyu SUN. Study of Container Technology Applied to JUNO [J]. Computer Engineering, 2019, 45(4): 1-5.
[15]	SUN Weijuan,WANG Ning. Top-k Entity Augmentation Algorithm Based on Consistent Supporting Degree [J]. Computer Engineering, 2019, 45(4): 181-188.

Please choose a citation manager

Content to export

Dynamical Data Regions Identification and Extraction in Web Pages

Web网页中动态数据区域的识别与抽取

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

Dynamical Data Regions Identification and Extraction in Web Pages

Web网页中动态数据区域的识别与抽取

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments