摘要: 在查询海量数据时,有压缩和索引两种方法来提高速度,。该文结合这两种方法提出了压缩查询的方法。FM-index是一种自索引的全文查询算法,。这种算法存在内存占用过大的问题,并且对于复杂的查询效率也不理想,。该文于是提出了分块FM-index算法,,并在分块的基础上采用MPI对该分块算法进行了并行化,。成功地解决了内存占用过多的问题,并达到了较好的并行效率。
关键词:
压缩,
自索引,
FM-index算法,
分块,
并行
Abstract: When dealing with massive volume data, there’ are two ways to achieve high performance: —one is to compress and the other one is to build index. Combining these two waysmethods, compressed query is emergingproposed. FM-index is such a compressed self-index algorithm used for full-text query. We found that The algorithm will occupiesy a large amount of main memory and is unable to handle complex query efficiently. To deal with these problems, this paper we proposesd a blocked version FM-index algorithm and parallelizesd it using MPI. The blocked algorithm greatly reducesd its memory usage, while the parallel version of blocked FM-index algorithm achievesd acceptable scalability.
Key words:
compression,
self-index,
FM-index,
blocking,
parallelization
中图分类号:
李开士;张云泉;李玉成. FM-index分块并行算法及其实现[J]. 计算机工程, 2008, 34(8): 53-55.
LI Kai-shi Li; ZHANG Yun-quan Zhang; LI Yu-cheng Li. Parallelization of Blocked FM-index Algorithm and Its Implementation[J]. Computer Engineering, 2008, 34(8): 53-55.