作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于用户行为与页面分析的改进PageRank算法

王旭阳,任国盛   

  1. (兰州理工大学计算机与通信学院,兰州 730000)
  • 收稿日期:2015-02-04 出版日期:2016-02-15 发布日期:2016-01-29
  • 作者简介:王旭阳(1974-),女,副教授、硕士,主研方向为数据挖掘、自然语言处理;任国盛,硕士研究生

Improved PageRank Algorithm Based on User Behavior and Page Analysis

WANG Xuyang,REN Guosheng   

  1. (School of Computer and Communication,Lanzhou University of Technology,Lanzhou 730000,China)
  • Received:2015-02-04 Online:2016-02-15 Published:2016-01-29

摘要: 在经典PageRank算法中,页面的转移概率平均分配到链出页面,由于新网页的链接较少,其PR值普遍较低。经典PageRank算法通过链接计算PR值,未考虑到网页的内容,存在主题漂移现象。针对上述问题,引入网页权威因子和时间因子,通过网页按关键字检索后被点击的次数进行统计,根据其在初始排序结果中的位置对网页的PR值进行迭代修正,返回一个新的排序结果并分析网页内容,采用改进的TD-IDF算法解决网页相关性,避免主题漂移现象。仿真实验结果表明,改进算法可以提高网页排名的质量,使相关度较高的网页通过人们的自主选择获得不同程度的加权,获得加权的网页在检索结果中的排名得到提升,从而提高用户需求网页的查准率。

关键词: 权威因子, 时间因子, 主题漂移, 转移概率, PR值

Abstract: In classical PageRank algorithm,transition probability of page is equally distributed to the outlinks.There are few links of new page,so its PR value is generally low.Classical PageRank algorithm is used to calculate the PR value through links,without taking into account the content of the page,so there is topic drift.To solve these problems,this paper introduces authoritative factor and time factor to explore the importance of Web pages,and after keywords researching,it counts the number of clicks.According to their position of initial sorted results,then iteration correcting PR value of Web page,it returns a new sorted result.The algorithm uses improved TD-IDF algorithm to slove page’s relevance by content analysis,improving the topic drift.Simulation results show that the improved algorithm can improve the quality of page ranking.A high degree of correlation of pages obtains different degrees weighted by the people choose.A weighted page is enhanced in ranking of the search results,and improves the precision of user demand pages.

Key words: authoritative factor, time factor, topic drift, transition probability, PR value

中图分类号: