摘要: HTTP/1.1的持续连接特性会给基于内容请求分发的Web集群服务器带来额外的开销。为减少这种开销,可将用户经常一起访问的网页组成簇并以簇为单位来分布文档。如何衡量网页间的距离是网页组簇的关键问题。该文提出一种基于马尔可夫链的衡量网页间距离的方法,该方法同时考虑了用户访问的时间相关性和用户的访问路径。实例表明,与基于时间相关性的衡量网页间距离的方法相比,采用该衡量方法能更有效地减少网页组簇后HTTP/1.1持续连接所带来的额外开销。
关键词:
Web集群服务器,
网页间距离,
网页组簇,
马尔可夫链
Abstract: The persistent connection characteristic of HTTP/1.1 may bring extra cost to Web server cluster with content-aware request distribution. To reduce such cost, the Web pages always being accessed by clients in the same session are organized into Web page clusters, and Web page cluster is viewed as a document distribution unit. How to measure the distance among Web pages is a key problem of Web page clustering. This paper proposes a measure of distance among Web pages based on Markov chain, which considers the time correlation of clients’ access and the path of clients’ access. Example shows that, compared with the measure of distance among Web pages based on time correlation, adopting such measure can more effectively reduce the extra cost brought by the persistent connection of HTTP/1.1 after clustering Web pages.
Key words:
Web server cluster,
distance among Web pages,
Web page clustering,
Markov chain
中图分类号:
熊 智;郭成城. 基于马尔可夫链的网页间距离衡量方法[J]. 计算机工程, 2008, 34(5): 110-112.
XIONG Zhi; GUO Cheng-cheng. Measure of Distance Among Web Pages Based on Markov Chain[J]. Computer Engineering, 2008, 34(5): 110-112.