作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (7): 53-55. doi: 10.3969/j.issn.1000-3428.2008.07.018

• 软件技术与数据库 • 上一篇    下一篇

基于CPat-Tree的URL索引模型裁剪方法

赵泽宇,闫 华   

  1. (复旦大学信息化办公室,上海 200433)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-04-05 发布日期:2008-04-05

Pruning Method of URL Index Model Based on CPat-Tree

ZHAO Ze-yu, YAN Hua   

  1. (Informatization Office, Fudan University, Shanghai 200433)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-04-05 Published:2008-04-05

摘要: 海量URL会造成网络内容过滤系统索引效率低下。该文提出一种基于CPat-Tree改进的URL分级信息存储模型的裁剪算法,通过键值相似度实现键聚类,直接对存储数组遍历以合并相似的叶子节点,减少索引占用空间,提高查询效率。该方法裁剪前后的存储空间变化效果取决于键相似度,因此其具有良好的扩展性。

关键词: CPat-Tree方法, 裁剪, URL数据库, 内容过滤

Abstract: Large growth in the number of URLs makes the indexes of Internet content filtering systems overstaffed. A pruning method of index model based on CPat-Tree is put forward. The method compares the similarity of keys of CPat-Tree and classifies them into clusters. After combination, the storage of arrays is largely reduced and query efficiency is improved. Due to the relevance between the similarity of keys and the effect of pruning method, the method is proved with good expansibility.

Key words: CPat-Tree method, pruning, URL database, content filtering

中图分类号: