Study and Application of Automation Extraction Technology from Web Authoritative Information

doi:10.3969/j.issn.1000-3428.2008.13.020

Computer Engineering ›› 2008, Vol. 34 ›› Issue (13): 54-55,6.

• Software Technology and Database • Previous Articles Next Articles

Study and Application of Automation Extraction Technology from Web Authoritative Information

LI Jing, YUAN Xiao-hua, SHEN Xiao-jing

(1. School of Information, Shanghai Fishery University, Shanghai 200090; 2. School of Telecommunication, Tongji University, Shanghai 201804)

Received:1900-01-01 Revised:1900-01-01 Online:2008-07-05 Published:2008-07-05

Web权威信息自动提取技术的研究及应用

李净，袁小华，沈晓晶

(1. 上海水产大学信息学院，上海 200090；2. 同济大学电信学院，上海 201804)

Abstract

Abstract: Although WWW has provided much information for all fields, how to extract the authoritative information from related fields exactly is becoming a hot topic. This paper provides a process of extracting table data it provides a multiple factors assessment model to judge the Web page. Using the model, the authoritative value of Web page can be gained correctly. It provides a table-based phrase tree method to extract the interesting data automatically. Example proves that this method can extract the authoritative information exactly and automatically.

Key words: data extraction, Web data mining, phrasing tree, multiple factors assessment, table

摘要： WWW为各行各业提供了大量的信息，但如何准确地从这些信息中提取出相关领域的权威信息是目前研究的热点问题之一。该文提出评判网站信息的多因素综合评估模型，该模型对网站的权威值进行合理计算，给出基于表格数据的语法树模型，完成了表格数据的自动提取。通过实例证明，该方法很好地解决了权威信息的准确和自动提取。

关键词: 数据提取, Web数据挖掘, 语法树, 多因素综合评估, 表格

CLC Number:

TP311.132

LI Jing; YUAN Xiao-hua; SHEN Xiao-jing. Study and Application of Automation Extraction Technology from Web Authoritative Information[J]. Computer Engineering, 2008, 34(13): 54-55,6.

李净;袁小华;沈晓晶. Web权威信息自动提取技术的研究及应用[J]. 计算机工程, 2008, 34(13): 54-55,6.

/ Recommend / Download Citations

URL:

https://www.ecice06.com/EN/Y2008/V34/I13/54

[1]	ZHANG Changchang, LÜ Weidong, CAI Zijie, LIU Yankui. Lightweight Image Classification Algorithm Based on Domain Generalization [J]. Computer Engineering, 2025, 51(1): 182-189.
[2]	QIAN Wenyuan, JING Yinan, WANG Xiaoyang, WU Zhenhuan. Cardinality Estimation Method for Multitable JOIN Query Optimization [J]. Computer Engineering, 2022, 48(6): 167-173.
[3]	QIN Xuan, FENG Lei, LIANG Qinghua, ZHANG Wei. Meter Pointer Location Algorithm Based on MSER-Otsu and Line Correction [J]. Computer Engineering, 2021, 47(7): 289-295,300.
[4]	YUAN Zhixiang, REN Dongdong, HONG Xudong, SUN Guohua. Research on Question Understanding Method Combining Database Structure and Content [J]. Computer Engineering, 2021, 47(3): 71-76,82.
[5]	ZHU Mingjian, FAN Yuan, ZHANG Chengxiao. Dynamic Event Triggering Mechanism Based on State Observer [J]. Computer Engineering, 2020, 46(10): 301-307,314.
[6]	YAN Xiaoyong, LI Qing, MO Youquan. State Machine Inference for Binary Protocol Based on State-related Field [J]. Computer Engineering, 2019, 45(7): 126-133.
[7]	FU Ming,HE Yang,XIONG Bing. Efficient lookup method of OpenFlow table based on predictive caching [J]. Computer Engineering, 2019, 45(5): 52-58.
[8]	SUN Weijuan,WANG Ning. Top-k Entity Augmentation Algorithm Based on Consistent Supporting Degree [J]. Computer Engineering, 2019, 45(4): 181-188.
[9]	WANG Yajuan,LI Xiao,YANG Yating,MI Chenggang. Research of Uyghur-Chinese Machine Translation System Combination Based on Paraphrase Information [J]. Computer Engineering, 2019, 45(4): 288-295,301.
[10]	GAO Quan,WAN Xiaodong. Parallel FP-Growth Algorithm Based on Load Balance [J]. Computer Engineering, 2019, 45(3): 32-35,40.
[11]	LI Ke,WANG Hai,XU Xiaolong,DU Yu. Mobile Network Cell Information Detection Method Based on Mobile Crowdsensing [J]. Computer Engineering, 2019, 45(2): 92-100.
[12]	ZHAO Baishan, LIU Yongqiang. OFDM Synchronization Algorithm Suitable for Portable Wireless Video Communication [J]. Computer Engineering, 2019, 45(11): 102-106.
[13]	HUANG Jihai,DING Ying,ZHAO Bing. VANET Hybrid Collaborative Caching Strategy Based on PIT Similarity [J]. Computer Engineering, 2019, 45(1): 315-320.
[14]	FANG Ting,WANG Yong,CHU Wenkui,TAN Xiaohu. Design of Scheduling Table Optimization Algorithm Based on Greedy Thought [J]. Computer Engineering, 2018, 44(9): 280-285.
[15]	FENG Huanhuan,DENG Jianhua. Numerical Simulation of Exhaust Emission Regularity Coupled with Cellular Automaton [J]. Computer Engineering, 2018, 44(6): 311-315.

Please choose a citation manager

Content to export

Study and Application of Automation Extraction Technology from Web Authoritative Information

Web权威信息自动提取技术的研究及应用

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

Study and Application of Automation Extraction Technology from Web Authoritative Information

Web权威信息自动提取技术的研究及应用

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments