作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (02): 152-154. doi: 10.3969/j.issn.1000-3428.2007.02.053

• 人工智能及识别技术 • 上一篇    下一篇

基于用户评价的查询串与搜索结果特征权重计算

吴春尧,曲文龙,杨炳儒   

  1. (北京科技大学信息工程学院,北京 100083)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-01-20 发布日期:2007-01-20

Feature Weight Calculation Between Query and Texts Based on User Evaluation

WU Chunyao, QU Wenlong, YANG Bingru   

  1. (Information Engineering College, University of Technology & Science Beijing, Beijing 100083)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-01-20 Published:2007-01-20

摘要: 提出了利用大量用户评价结果来进行特征权重的计算方法,用于解决搜索引擎中查询串与搜索结果的相似度分析。该方法完全利用用户对搜索结果的“潜在评价”来进行。用户对输入查询串所做的点击反映了其内部的关联性,该文提出的方法可获取这种关联性,对该问题建立了数学模型,利用EM算法解决了特征权重的计算。由于模型的函数比较复杂,难于计算其收敛性,因此,使用了模拟退火算法作为EM算法的补充,用于验证算法的收敛性。实验使用百度搜索引擎在竞价广告上进行,提取的测试数据样本为100个广告和144 132个query,获得的数据结果显示,所有特征收敛到全局最优解,抽样部分数据获得检索相似准确率为93.32%,召回率为87.43%。

关键词: 网页排名, 特征权重, EM算法, 模拟退火算法

Abstract: This paper proposes a feature evaluation algorithm by using users click in order to analysis similarity between query and texts in search engines. This method gets features from potential evaluation of user search results because user’s clicks to search results reflect the inner relation between query and documents in search results. EM algorithm is used to calculate feature weights. It is difficult to know whether the model’s function is convergent because of its complexity. So the simulation annealing algorithm validates the model’s convergence as the complement of EM algorithm. The experiment is carried out in Baidu’s advertisement ranking. The samples have 100 advertisement and 144 132 queries related to these advertisement. The experiment shows its precision is 93.32% and its recall is 87.43%. All features in the experiment are convergent.

Key words: Page rank, Feature weight, EM algorithm, Simulation annealing algorithm