作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (2): 75-77. doi: 10.3969/j.issn.1000-3428.2008.02.025

• 软件技术与数据库 • 上一篇    下一篇

基于复合结构的高效索引在线更新策略

赵 亮   

  1. (上海交通大学软件学院,上海 200240)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-01-20 发布日期:2008-01-20

On-line Update Strategy Based on High Performance of Hybrid

ZHAO Liang   

  1. (School of Software, Shanghai Jiaotong University, Shanghai 200240)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-01-20 Published:2008-01-20

摘要: 倒排索引结构已被广泛地应用在信息检索系统中,倒排索引离线的生成和更新方法已不适合在线更新。文中研究了在线索引更新方法,分析了合并更新、插入更新、复合更新等方法,提出一种结合“插入更新”和“合并更新”优点,并采用多级结构的改进复合更新策略。使用磁盘操作复杂度来衡量更新策略的性能,对几种常用的更新策略和复合更新策略在大量记录下的性能进行理论和实验分析。结果显示,改进复合更新策略具有较好的效率。

关键词: 倒排索引, 更新策略, 倒排索引结构

Abstract: Inverted index structures are the mainstay of modern text retrieval systems. While the off-line construction and update methods are not suitable for on-line update. This paper discusses the virtues and shortcomings of the re-merge strategy, in-place strategy and hybrid strategy, and presents an improved hybrid index update strategy with multilevel that combines the virtue of previous methods together. It uses the disk access complexity to analyze the performances of those strategies in very large text collections, both the theoretical and experimental results show that the improved hybrid index strategy has a better performance.

Key words: inverted index, update strategy, inverted index structure

中图分类号: