作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2009, Vol. 35 ›› Issue (5): 53-55. doi: 10.3969/j.issn.1000-3428.2009.05.018

• 软件技术与数据库 • 上一篇    下一篇

基于DOM树和递归X-Y分割算法的Zone树模型

黄 歆,桑 楠   

  1. (电子科技大学软件学院,成都 610054)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-03-05 发布日期:2009-03-05

Zone Tree Model Based on DOM Tree and Recursive X-Y Cut Algorithm

HUANG Xin, SANG Nan   

  1. (School of Software, University of Electronic Science and Technology of China, Chengdu 610054)

  • Received:1900-01-01 Revised:1900-01-01 Online:2009-03-05 Published:2009-03-05

摘要: 在分析DOM树的基础上提出一种基于DOM树和递归X-Y分割算法,可以根据网页的几何布局生成Zone树模型。描述了将Zone树模型和递归X-Y算法应用到文献数据检索的优越性,给出构建Zone树模型的算法。该模型主要用于在线文献的数据提取,具有速度快、准确性高等特点,优于目前大多数浏览器所采用的DOM树结构。

关键词: HTML文档, DOM树, 递归X-Y分割算法, Zone树

Abstract: Taking into account of the characteristics of DOM tree, the paper presents a new Zone tree model bases on DOM tree and recursive x-y cut algorithm, which is generated by geometric layout and illustrates its advantage over DOM tree when it is applied to information retrieval and describes a Zone tree model algorithm. This model is mainly applied to extract bibliographic data from online articles. It is better than DOM tree used by most browsers with speed and high accuracy.

Key words: HTML document, DOM tree, recursive X-Y cut algorithm, Zone tree

中图分类号: