作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (8): 53-55. doi: 10.3969/j.issn.1000-3428.2008.08.018

• 软件技术与数据库 • 上一篇    下一篇

FM-index分块并行算法及其实现

李开士1,2,3,张云泉2,3,李玉成2   

  1. (1. 中国科学院研究生院,北京 100080;2. 中国科学院软件研究所并行计算实验室,北京 100080;3. 中国科学院计算机科学国家重点实验室,北京 100080)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-04-20 发布日期:2008-04-20

Parallelization of Blocked FM-index Algorithm and Its Implementation

LI Kai-shi Li1,2,3, ZHANG Yun-quan Zhang2,3, LI Yu-cheng Li2   

  1. (1. Graduate University ofSchool ,of Chinese Academy of Sciences, Beijing 100080; 2 . Lab. of Parallel Computing, Institute of Software, Chinese Academy of Sciences, Beijing 100080; 3. State Key Laboratory of Computer Sciences, Chinese Academy of SciencesCAS, Beijing 100080)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-04-20 Published:2008-04-20

摘要: 在查询海量数据时,有压缩和索引两种方法来提高速度,。该文结合这两种方法提出了压缩查询的方法。FM-index是一种自索引的全文查询算法,。这种算法存在内存占用过大的问题,并且对于复杂的查询效率也不理想,。该文于是提出了分块FM-index算法,,并在分块的基础上采用MPI对该分块算法进行了并行化,。成功地解决了内存占用过多的问题,并达到了较好的并行效率。

关键词: 压缩, 自索引, FM-index算法, 分块, 并行

Abstract: When dealing with massive volume data, there’ are two ways to achieve high performance: —one is to compress and the other one is to build index. Combining these two waysmethods, compressed query is emergingproposed. FM-index is such a compressed self-index algorithm used for full-text query. We found that The algorithm will occupiesy a large amount of main memory and is unable to handle complex query efficiently. To deal with these problems, this paper we proposesd a blocked version FM-index algorithm and parallelizesd it using MPI. The blocked algorithm greatly reducesd its memory usage, while the parallel version of blocked FM-index algorithm achievesd acceptable scalability.

Key words: compression, self-index, FM-index, blocking, parallelization

中图分类号: