作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (13): 54-55,6. doi: 10.3969/j.issn.1000-3428.2008.13.020

• 软件技术与数据库 • 上一篇    下一篇

Web权威信息自动提取技术的研究及应用

李 净,袁小华,沈晓晶   

  1. (1. 上海水产大学信息学院,上海 200090;2. 同济大学电信学院,上海 201804)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-07-05 发布日期:2008-07-05

Study and Application of Automation Extraction Technology from Web Authoritative Information

LI Jing, YUAN Xiao-hua, SHEN Xiao-jing   

  1. (1. School of Information, Shanghai Fishery University, Shanghai 200090; 2. School of Telecommunication, Tongji University, Shanghai 201804)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-07-05 Published:2008-07-05

摘要: WWW为各行各业提供了大量的信息,但如何准确地从这些信息中提取出相关领域的权威信息是目前研究的热点问题之一。该文提出评判网站信息的多因素综合评估模型,该模型对网站的权威值进行合理计算,给出基于表格数据的语法树模型,完成了表格数据的自动提取。通过实例证明,该方法很好地解决了权威信息的准确和自动提取。

关键词: 数据提取, Web数据挖掘, 语法树, 多因素综合评估, 表格

Abstract: Although WWW has provided much information for all fields, how to extract the authoritative information from related fields exactly is becoming a hot topic. This paper provides a process of extracting table data it provides a multiple factors assessment model to judge the Web page. Using the model, the authoritative value of Web page can be gained correctly. It provides a table-based phrase tree method to extract the interesting data automatically. Example proves that this method can extract the authoritative information exactly and automatically.

Key words: data extraction, Web data mining, phrasing tree, multiple factors assessment, table

中图分类号: