计算机工程 ›› 2009, Vol. 35 ›› Issue (6): 44-46.doi: 10.3969/j.issn.1000-3428.2009.06.015

• 软件技术与数据库 • 上一篇    下一篇

基于关联规则与聚类算法的查询扩展算法

李大高1,程显毅1,张冬慧2   

  1. (1. 江苏大学计算机与通信工程学院,镇江 212013;2. 北京师范大学教育技术学院,北京 100875)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-03-20 发布日期:2009-03-20

Query Expansion Algorithm Based on Association Rules and Cluster Algorithm

LI Da-gao1, CHENG Xian-yi1, ZHANG Dong-hui2   

  1. (1. School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013;2. School of Education Technology, Beijing Normal University, Beijing 100875)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-03-20 Published:2009-03-20

摘要: 针对信息检索中查询关键词与文档用词不匹配的问题,提出一种基于关联规则与聚类算法的查询扩展算法。该算法在第1阶段对初始查询结果的前N篇文档进行关联规则挖掘,提取含有初始查询项的关联规则构建规则库,并从中选取与查询用词关联度最大的K个词作为扩展词,与初始查询组成新查询后再次查询,在第2阶段将新查询结果进行聚类分析并计算结果中每篇文档的最终相关度,按最终相关度大小重新排序。实验结果表明,该算法比单独使用关联规则算法或是单独使用聚类算法均有更优的检索性能。

关键词: 信息检索, 查询扩展, 关联规则, 聚类算法

Abstract: To solve the problem of word-mismatch between query key words and document words, this paper puts forward a query expansion algorithm based on the combination of association rules and cluster algorithm. At the first stage it uses association rules on the front N documents in the first query result, and gets the rules that have query item to build the rules base, and gets the K words that have the most similarity with the query words to form a new query and query again to get a new result. At the second stage it uses cluster algorithm on the new result and compute every document’s final similarity to get a document re-ranking. Experimental result shows this query expansion algorithm outperforms both the association rules and the cluster algorithm.

Key words: information retrieval, query expansion, association rules, cluster algorithm

中图分类号: