计算机工程 ›› 2021, Vol. 47 ›› Issue (1): 72-78,86.doi: 10.19678/j.issn.1000-3428.0059192

• 人工智能与模式识别 • 上一篇    下一篇

大规模动态图中概率游走约束的节点相似Top-k查询方法

陈泽1, 丁琳琳2,3, 宋宝燕1, 王俊陆1   

  1. 1. 辽宁大学 信息学院, 沈阳 110036;
    2. 山东能源新汶矿业集团有限责任公司, 山东 泰安 271200;
    3. 东北大学 资源与土木工程学院, 沈阳 110004
  • 收稿日期:2020-08-07 修回日期:2020-09-17 发布日期:2020-10-22
  • 作者简介:陈泽(1996-),男,硕士,主研方向为大规模图数据处理技术;丁琳琳,副教授、博士;宋宝燕,教授、博士;王俊陆(通信作者),博士研究生。
  • 基金项目:
    国家自然科学基金(61472169,61502215);中国博士后基金面上项目(2020M672134);辽宁省重点研发计划(2017231011);辽宁省教育厅科学研究项目(LJC201913);沈阳市中青年科技创新人才支持计划(RC180244)。

Node Similarity Top-k Query Method with Probabilistic Walk Constraint in Large-Scale Dynamic Graphs

CHEN Ze1, DING Linlin2,3, SONG Baoyan1, WANG Junlu1   

  1. 1. College of Information, Liaoning University, Shenyang 110036, China;
    2. Shandong Energy Xinwen Mining Group Co., Ltd., Taian, Shandong 271200, China;
    3. School of Resources and Civil Engineering, Northeastern University, Shenyang 110004, China
  • Received:2020-08-07 Revised:2020-09-17 Published:2020-10-22

摘要: 大规模动态图节点相似Top-k查询方法对大规模图查询效率较低,且当图发生动态变化时难以对查询结果进行自适应更新,导致查询结果准确度不高。利用大规模动态图概率路径游走约束条件,提出一种节点相似Top-k查询方法。通过引入PageRank概率游走机制实现将基大图生成多个小规模单向图,并利用单边弱化因子对PageRank进行概率游走约束,避免单向图反复选取少数边的情况。采用Monte Carlo模拟法进行单向图集上的相似度累积计算,以Top-k取值为衡量准则递增游走步数,避免次优相似度叠加问题。结合图的动态性特点,依据局部自适应原则提出基大图触发更新策略与单向图集联动更新策略,在保证查询准确度的同时最大限度地降低更新维护代价。实验结果表明,与FR、KM、SimRank、P-SimRank等方法相比,该方法可有效提高查询效率、查询准确度与更新效率。

关键词: 大规模动态图, PageRank机制, 概率游走约束, 自适应更新, Top-k查询方法

Abstract: The existing large-scale dynamic graphs node similarity Top-k query methods for large-scale graphs are inefficient and fail to adaptively update the query results when the graph changes dynamically,which leads to a reduction in the accuracy of query results.This paper combines the constraint conditions of probabilistic path walking in large-scale dynamic graphs to propose a node similarity Top-k query method.By introducing the PageRank probabilistic walk mechanism,multiple small-scale unidirectional graphs can be generated from the base large graph,and the unilateral weakening factor is used to constrain PageRank probabilistic walk to prevent the unidirectional graph from repeatedly selecting a few edges.Then the Monte Carlo simulation method is used for the similarity accumulation calculation on the unidirectional graph set.The Top-k value is used as the measurement criterion to increase the number of walking steps to avoid suboptimal similarity stacking. In view of the dynamic characteristics of graphs,based on the principle of local adaptation,a base large graph trigger update strategy and a unidirectional graph set linkage update strategy are proposed to minimize the cost of update and maintenance while ensuring query accuracy.Experimental results show that compared with FR,KM,SimRank and P-SimRank methods,this method can effectively improve query efficiency,query accuracy and update efficiency.

Key words: large-scale dynamic graphs, PageRank mechanism, probabilistic walk constraint, adaptive update, Top-k query method

中图分类号: