作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (5): 110-112. doi: 10.3969/j.issn.1000-3428.2008.05.038

• 网络与通信 • 上一篇    下一篇

基于马尔可夫链的网页间距离衡量方法

熊 智1,郭成城2   

  1. (1. 汕头大学计算机科学系,汕头 515063;2. 武汉大学电子信息学院,武汉 430079)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-03-05 发布日期:2008-03-05

Measure of Distance Among Web Pages Based on Markov Chain

XIONG Zhi1, GUO Cheng-cheng2   

  1. (1. Department of Computer Science, Shantou University, Shantou 515063; 2. School of Electronic Information, Wuhan University, Wuhan 430079)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-03-05 Published:2008-03-05

摘要: HTTP/1.1的持续连接特性会给基于内容请求分发的Web集群服务器带来额外的开销。为减少这种开销,可将用户经常一起访问的网页组成簇并以簇为单位来分布文档。如何衡量网页间的距离是网页组簇的关键问题。该文提出一种基于马尔可夫链的衡量网页间距离的方法,该方法同时考虑了用户访问的时间相关性和用户的访问路径。实例表明,与基于时间相关性的衡量网页间距离的方法相比,采用该衡量方法能更有效地减少网页组簇后HTTP/1.1持续连接所带来的额外开销。

关键词: Web集群服务器, 网页间距离, 网页组簇, 马尔可夫链

Abstract: The persistent connection characteristic of HTTP/1.1 may bring extra cost to Web server cluster with content-aware request distribution. To reduce such cost, the Web pages always being accessed by clients in the same session are organized into Web page clusters, and Web page cluster is viewed as a document distribution unit. How to measure the distance among Web pages is a key problem of Web page clustering. This paper proposes a measure of distance among Web pages based on Markov chain, which considers the time correlation of clients’ access and the path of clients’ access. Example shows that, compared with the measure of distance among Web pages based on time correlation, adopting such measure can more effectively reduce the extra cost brought by the persistent connection of HTTP/1.1 after clustering Web pages.

Key words: Web server cluster, distance among Web pages, Web page clustering, Markov chain

中图分类号: