Abstract:
Although WWW has provided much information for all fields, how to extract the authoritative information from related fields exactly is becoming a hot topic. This paper provides a process of extracting table data it provides a multiple factors assessment model to judge the Web page. Using the model, the authoritative value of Web page can be gained correctly. It provides a table-based phrase tree method to extract the interesting data automatically. Example proves that this method can extract the authoritative information exactly and automatically.
Key words:
data extraction,
Web data mining,
phrasing tree,
multiple factors assessment,
table
摘要: WWW为各行各业提供了大量的信息,但如何准确地从这些信息中提取出相关领域的权威信息是目前研究的热点问题之一。该文提出评判网站信息的多因素综合评估模型,该模型对网站的权威值进行合理计算,给出基于表格数据的语法树模型,完成了表格数据的自动提取。通过实例证明,该方法很好地解决了权威信息的准确和自动提取。
关键词:
数据提取,
Web数据挖掘,
语法树,
多因素综合评估,
表格
CLC Number:
LI Jing; YUAN Xiao-hua; SHEN Xiao-jing. Study and Application of Automation Extraction Technology from Web Authoritative Information[J]. Computer Engineering, 2008, 34(13): 54-55,6.
李 净;袁小华;沈晓晶. Web权威信息自动提取技术的研究及应用[J]. 计算机工程, 2008, 34(13): 54-55,6.