摘要: 通过进一步发展Wen-Syan Li等人提出的Web站点逻辑域理论,该文提出Web站点逻辑域核模型及建立在其上的逻辑域挖掘算法。该算法通过对Web站点超链接的图结构进行运算,得到Web站点逻辑域。与Wen-Syan Li算法对比测试,结果表明在获得相同逻辑域个数的情况下,克服了其采用启发式方法所带来的效率问题。在对4个大型Web站点的单独测试中,平均能够达到85%的逻辑域挖掘精度。
关键词:
Web站点结构挖掘,
逻辑域,
逻辑域核
Abstract: By developing Wen-Syan Li’s website logical domain theory, the paper proposes a website logical domain core model and logical domain mining algorithm based upon it. The algorithm computes website’s hyperlink graph structure to obtain its logical domain. In comparative test with Wen-Syan Li’s algorithm, it overcomes the efficiency defect of Wen-Syan Li’s huristic method while obtaining the same quantity of logical domain. In separate test of 4 large scale websites, the logical domain core mining precision can averagely reach 85%.
Key words:
website structure mining,
logical domain,
logical domain core
中图分类号:
郑皎凌. 大型Web站点逻辑域挖掘算法[J]. 计算机工程, 2008, 34(9): 101-102,.
ZHENG Jiao-ling. Large Scale Website Logical Domain Mining Algorithm[J]. Computer Engineering, 2008, 34(9): 101-102,.