摘要: Lucene是一个高性能、易扩展的基于Java技术的全文信息检索工具包,它能非常方便地为各种应用程序加入全文索引和搜索功能。该文探讨了Lucene中使用的向量空间模型,分析了Lucene索引文件的结构以及搜索排序算法,讨论了Lucene的压缩算法并且通过实验验证了Lucene的建立索引的过程。
关键词:
Lucene,
向量空间模型,
排序算法,
信息检索
Abstract: As an information retrieval library written in Java, Lucene, with high performance and easy to scale, can easily add searching and indexing capabilities to applications. This paper discusses the vector space model used in Lucene, analyzes the structure of index files and ranking algorithm, and describes the compressing algorithm in Lucene. An experiment is done to test the indexing process of Lucene.
Key words:
Lucene,
vector space model,
ranking algorithm,
information retrieval
中图分类号:
周登朋;谢康林. Lucene搜索引擎[J]. 计算机工程, 2007, 33(18): 95-96,1.
ZHOU Deng-peng; XIE Kang-lin. Lucene Search Engine[J]. Computer Engineering, 2007, 33(18): 95-96,1.