作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (18): 40-41,4. doi: 10.3969/j.issn.1000-3428.2007.18.014

• 软件技术与数据库 • 上一篇    下一篇

基于模糊集的主题提取和层次发现算法

周红芳1,2, 冯博琴1   

  1. (1. 西安交通大学电子与信息工程学院,西安 710049;2. 西安理工大学计算机学院,西安 710048)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-09-20 发布日期:2007-09-20

Algorithm for Topic Distillation and Hierarchical Exploration Based on New Fuzzy Set

ZHOU Hong-fang1,2, FENG Bo-qin1   

  1. (1. School of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an 710049; 2. School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-09-20 Published:2007-09-20

摘要: 从语义相关性角度分析超链归纳主题搜索(HITS)算法,发现其产生主题漂移的原因在于页面被投影到错误的语义基上,提出了一种基于模糊集的主题提取和层次发现算法(FSTH),通过用户日志扩展查询词,构造符合用户需要的个性化根集和基础集合,达到防止主题漂移的目的。FSTH采用模糊集划分方法,层次地发现与用户查询相关的主题页面集合,利用HITS算法分别计算每个主题页面集合中页面的权威值,返回与查询相关的其他主题权威页面。在14个查询上的实验结果表明,与HITS算法相比,FSTH算法不仅可以减少7%~53%的主题漂移率,而且可以发现与查询相关的多个主题.

关键词: 模糊集, 超链归纳主题搜索, 主题提取, 主题漂移, 查询扩展

Abstract: To interpret the procedure of hypertext induced topic search based on a semantic relation model, the reason about the topic drift of HITS is found that Web pages are projected to a wrong latent semantic basis. A new fuzzy set based algorithm for topic distillation and hierarchical exploration (FSTH) is presented to improve the quality of topic distillation. Personalized root set and base set with query expansion is constructed using individual query logs to avoid the topic drift, and applying a hierarchical division algorithm based on fuzzy set to explore relative topics of user query, and then using HITS to evaluate and return authority pages of relative topics to end-users. The experimental results on 10 queries show that FSTH reduces topic drift rate by 7% to 53% compared to that of HITS, and discovers several relative topics to queries that have multiple meanings.

Key words: fuzzy set, hypertext induced topic search, topic distillation, topic drift, query expansion

中图分类号: