作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2010, Vol. 36 ›› Issue (7): 66-67,7. doi: 10.3969/j.issn.1000-3428.2010.07.023

• 软件技术与数据库 • 上一篇    下一篇

深层网中基于入口查询的表单填充策略

马建华1,李赛红2,徐兰兰2   

  1. (1. 南京邮电大学教务处,南京 210043;2. 南京师范大学教育技术系,南京 210097)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2010-04-05 发布日期:2010-04-05

Form Filling Strategy Based on Entrance Query in Deep Web

MA Jian-hua1, LI Sai-hong2, XU Lan-lan2   

  1. (1. Office of Academic Affair, Nanjing University of Posts and Telecommunications, Nanjing 210043;
    2. Department of Educational Technology, Nanjing Normal University, Nanjing 210097)
  • Received:1900-01-01 Revised:1900-01-01 Online:2010-04-05 Published:2010-04-05

摘要: 针对深层网中数据量大导致无法被传统搜索引擎索引的问题,在提取网页中,改进启发式规则识别表单查询入口,在表单标签与内容匹配时,改进基于语义的相似度匹配算法进行表单内容填充。实验结果表明,提取表单标签的准确率达到94.23%,匹配成功率达到88.83%,填充成功率达到95.43%。

关键词: 深层网, 入口查询, 表单填充

Abstract: Aiming at the problem that large data in deep Web can not be indexed by traditional searching engine, this paper uses an improved heuristic rules to identify entrance query of form in extractive Web pages. It adopts improved similarity matching algorithm based on semantic to fill form content when form tag matching with content. Experimental results show that the veracity rate of extracted form label is 94.23%, success rate of the matching is 88.83% and filling form control is 95.43%.

Key words: deep Web, entrance query, form filling

中图分类号: