基于关联规则与聚类算法的查询扩展算法

doi:10.3969/j.issn.1000-3428.2009.06.015

计算机工程 ›› 2009, Vol. 35 ›› Issue (6): 44-46. doi: 10.3969/j.issn.1000-3428.2009.06.015

基于关联规则与聚类算法的查询扩展算法

李大高1，程显毅1，张冬慧2

(1. 江苏大学计算机与通信工程学院，镇江 212013；2. 北京师范大学教育技术学院，北京 100875)

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-03-20 发布日期:2009-03-20

Query Expansion Algorithm Based on Association Rules and Cluster Algorithm

LI Da-gao1, CHENG Xian-yi1, ZHANG Dong-hui2

(1. School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013;2. School of Education Technology, Beijing Normal University, Beijing 100875)

Received:1900-01-01 Revised:1900-01-01 Online:2009-03-20 Published:2009-03-20

摘要/Abstract

摘要： 针对信息检索中查询关键词与文档用词不匹配的问题，提出一种基于关联规则与聚类算法的查询扩展算法。该算法在第1阶段对初始查询结果的前N篇文档进行关联规则挖掘，提取含有初始查询项的关联规则构建规则库，并从中选取与查询用词关联度最大的K个词作为扩展词，与初始查询组成新查询后再次查询，在第2阶段将新查询结果进行聚类分析并计算结果中每篇文档的最终相关度，按最终相关度大小重新排序。实验结果表明，该算法比单独使用关联规则算法或是单独使用聚类算法均有更优的检索性能。

关键词: 信息检索, 查询扩展, 关联规则, 聚类算法

Abstract: To solve the problem of word-mismatch between query key words and document words, this paper puts forward a query expansion algorithm based on the combination of association rules and cluster algorithm. At the first stage it uses association rules on the front N documents in the first query result, and gets the rules that have query item to build the rules base, and gets the K words that have the most similarity with the query words to form a new query and query again to get a new result. At the second stage it uses cluster algorithm on the new result and compute every document’s final similarity to get a document re-ranking. Experimental result shows this query expansion algorithm outperforms both the association rules and the cluster algorithm.

Key words: information retrieval, query expansion, association rules, cluster algorithm

中图分类号:

TP391

李大高;程显毅;张冬慧. 基于关联规则与聚类算法的查询扩展算法[J]. 计算机工程, 2009, 35(6): 44-46.

LI Da-gao; CHENG Xian-yi; ZHANG Dong-hui. Query Expansion Algorithm Based on Association Rules and Cluster Algorithm[J]. Computer Engineering, 2009, 35(6): 44-46.

http://www.ecice06.com/CN/Y2009/V35/I6/44

[1]	王芙银, 张德生, 肖燕婷. 基于加权共享近邻与累加序列的密度峰值算法[J]. 计算机工程, 2022, 48(4): 61-69.
[2]	李佩, 陈乔松, 陈鹏昌, 邓欣, 王进, 朴昌浩. 基于模态特异及模态共享特征信息的多模态细粒度检索[J]. 计算机工程, 2022, 48(11): 62-68,76.
[3]	王治和, 王淑艳, 杜辉. 基于密度敏感距离的改进模糊C均值聚类算法[J]. 计算机工程, 2021, 47(5): 88-96,103.
[4]	周伟枭, 蓝雯飞. 融合文本分类的多任务学习摘要模型[J]. 计算机工程, 2021, 47(4): 48-55.
[5]	刘治国, 蔡文珠, 李运琪, 潘成胜. 基于序列统计的未知无线协议特征提取方法[J]. 计算机工程, 2021, 47(11): 192-197.
[6]	刘宇航, 马慧芳, 刘海姣, 余丽. 一种可重叠子空间K-Means聚类算法[J]. 计算机工程, 2020, 46(8): 58-63,71.
[7]	陆慎涛, 葛洪伟. 一种抗噪的移动时间势能聚类算法[J]. 计算机工程, 2020, 46(5): 144-149.
[8]	张强, 张勇, 刘芝国, 周文军, 刘佳慧. 基于改进YOLOv3的手势实时识别方法[J]. 计算机工程, 2020, 46(3): 237-245,253.
[9]	王玉奇, 高建华. 一种基于关联规则的Web应用统计测试方法[J]. 计算机工程, 2020, 46(3): 206-213.
[10]	李洁, 朱洪亮, 陈玉玲, 辛阳. 基于哈希存储与事务加权的并行Apriori改进算法[J]. 计算机工程, 2020, 46(11): 109-116.
[11]	唐鸿成, 文畅, 冯文祥, 谢凯, 方文青. 基于智能聚类模型的海量数据快速显示方法[J]. 计算机工程, 2019, 45(8): 53-59.
[12]	钱雪忠,姚琳燕. 面向稀疏高维大数据的扩展增量模糊聚类算法[J]. 计算机工程, 2019, 45(6): 75-81.
[13]	牛壮,李凤莲,张雪英,樊宇宙,魏鑫. 改进欠抽样方法及其在非平衡数据集分类中的应用[J]. 计算机工程, 2019, 45(6): 218-224.
[14]	向程冠,熊世桓,王东,熊伟程. 基于关联规则与相似度的社交好友推荐算法[J]. 计算机工程, 2019, 45(4): 175-180.
[15]	高军,黄献策. 基于Hadoop平台的相关性权重算法设计与实现[J]. 计算机工程, 2019, 45(3): 26-31.

选择文件类型/文献管理软件名称

选择包含的内容

基于关联规则与聚类算法的查询扩展算法

Query Expansion Algorithm Based on Association Rules and Cluster Algorithm

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于关联规则与聚类算法的查询扩展算法

Query Expansion Algorithm Based on Association Rules and Cluster Algorithm

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价