大规模动态图中概率游走约束的节点相似Top-k查询方法

doi:10.19678/j.issn.1000-3428.0059192

计算机工程 ›› 2021, Vol. 47 ›› Issue (1): 72-78,86. doi: 10.19678/j.issn.1000-3428.0059192

大规模动态图中概率游走约束的节点相似Top-k查询方法

陈泽¹, 丁琳琳^2,3, 宋宝燕¹, 王俊陆¹

1. 辽宁大学信息学院, 沈阳 110036;
2. 山东能源新汶矿业集团有限责任公司, 山东泰安 271200;
3. 东北大学资源与土木工程学院, 沈阳 110004

收稿日期:2020-08-07 修回日期:2020-09-17 发布日期:2020-10-22
作者简介:陈泽(1996-),男,硕士,主研方向为大规模图数据处理技术;丁琳琳,副教授、博士;宋宝燕,教授、博士;王俊陆(通信作者),博士研究生。
基金资助:
国家自然科学基金（61472169，61502215）；中国博士后基金面上项目（2020M672134）；辽宁省重点研发计划（2017231011）；辽宁省教育厅科学研究项目（LJC201913）；沈阳市中青年科技创新人才支持计划（RC180244）。

Node Similarity Top-k Query Method with Probabilistic Walk Constraint in Large-Scale Dynamic Graphs

CHEN Ze¹, DING Linlin^2,3, SONG Baoyan¹, WANG Junlu¹

1. College of Information, Liaoning University, Shenyang 110036, China;
2. Shandong Energy Xinwen Mining Group Co., Ltd., Taian, Shandong 271200, China;
3. School of Resources and Civil Engineering, Northeastern University, Shenyang 110004, China

Received:2020-08-07 Revised:2020-09-17 Published:2020-10-22

摘要/Abstract

摘要： 大规模动态图节点相似Top-k查询方法对大规模图查询效率较低，且当图发生动态变化时难以对查询结果进行自适应更新，导致查询结果准确度不高。利用大规模动态图概率路径游走约束条件，提出一种节点相似Top-k查询方法。通过引入PageRank概率游走机制实现将基大图生成多个小规模单向图，并利用单边弱化因子对PageRank进行概率游走约束，避免单向图反复选取少数边的情况。采用Monte Carlo模拟法进行单向图集上的相似度累积计算，以Top-k取值为衡量准则递增游走步数，避免次优相似度叠加问题。结合图的动态性特点，依据局部自适应原则提出基大图触发更新策略与单向图集联动更新策略，在保证查询准确度的同时最大限度地降低更新维护代价。实验结果表明，与FR、KM、SimRank、P-SimRank等方法相比，该方法可有效提高查询效率、查询准确度与更新效率。

关键词: 大规模动态图, PageRank机制, 概率游走约束, 自适应更新, Top-k查询方法

Abstract: The existing large-scale dynamic graphs node similarity Top-k query methods for large-scale graphs are inefficient and fail to adaptively update the query results when the graph changes dynamically,which leads to a reduction in the accuracy of query results.This paper combines the constraint conditions of probabilistic path walking in large-scale dynamic graphs to propose a node similarity Top-k query method.By introducing the PageRank probabilistic walk mechanism,multiple small-scale unidirectional graphs can be generated from the base large graph,and the unilateral weakening factor is used to constrain PageRank probabilistic walk to prevent the unidirectional graph from repeatedly selecting a few edges.Then the Monte Carlo simulation method is used for the similarity accumulation calculation on the unidirectional graph set.The Top-k value is used as the measurement criterion to increase the number of walking steps to avoid suboptimal similarity stacking. In view of the dynamic characteristics of graphs,based on the principle of local adaptation,a base large graph trigger update strategy and a unidirectional graph set linkage update strategy are proposed to minimize the cost of update and maintenance while ensuring query accuracy.Experimental results show that compared with FR,KM,SimRank and P-SimRank methods,this method can effectively improve query efficiency,query accuracy and update efficiency.

Key words: large-scale dynamic graphs, PageRank mechanism, probabilistic walk constraint, adaptive update, Top-k query method

中图分类号:

TP311

陈泽, 丁琳琳, 宋宝燕, 王俊陆. 大规模动态图中概率游走约束的节点相似Top-k查询方法[J]. 计算机工程, 2021, 47(1): 72-78,86.

CHEN Ze, DING Linlin, SONG Baoyan, WANG Junlu. Node Similarity Top-k Query Method with Probabilistic Walk Constraint in Large-Scale Dynamic Graphs[J]. Computer Engineering, 2021, 47(1): 72-78,86.

https://www.ecice06.com/CN/Y2021/V47/I1/72

图/表 7

20210125163311

20210125163315

20210125163318

20210125163321

20210125163324

20210125163327

20210125163331

参考文献

[1] YANG Sibei,LI Guanbi,YU Yizhou.Dynamic graph attention for referring expression comprehension[EB/OL].[2020-07-02].https://arxiv.org/abs/1909.08164.
[2] BAI Yunsheng,DING Hao,GU Ken,et al.Learning-based efficient graph similarity computation via multi-scale convolutional set matching[C]//Proceedings of the AAAI Conference on Artificial Intelligence.[S.l.]:AAAI,2020:3219-3226.
[3] AHMED F,LIU A X,JIN R.Publishing social network graph eigenspectrum with privacy guarantees[J].IEEE Transactions on Network Science and Engineering,2020,7(2):892-906.
[4] CHEN Weiqi,CHEN Ling,XIE Yu,et al.Multi-range attentive bicomponent graph convolutional network for traffic forecasting[C]//Proceedings of the AAAI Conference on Artificial Intelligence.[S.l.]:AAAI,2020:3529-3536.
[5] FUNEL A.Analysis of the Web graph aggregated by host and pay-level domain[EB/OL].[2020-07-02].https://arxiv.org/pdf/1802.05435.pdf.
[6] ZHANG Tianming,GAO Yunjun,ZHENG Baihua,et al.Towards distributed node similarity search on graphs[J].World Wide Web,2020,23(6):3025-3053.
[7] SONG Youmei,LI Jianbo,HE Tianyue,et al.Probabilistic routing algorithm in delay tolerant network based on node similarity[J].Computer Engineering,2016,42(9):63-70.(in Chinese)宋有美,李建波,和天玥,等.基于节点相似性的容迟网络概率路由算法[J].计算机工程,2016,42(9):63-70.
[8] QIU Jing,CHAI Yunhan,TIAN Zhihong,et al.Automatic concept extraction based on semantic graphs from big data in smart city[J].IEEE Transactions on Computational Social Systems,2019,7(1):225-233.
[9] TOAPANTA M,MAFLA E,CISNERO B,et al.Analysis to predict cybercrime using information technology in a globalized environment[C]//Proceedings of the 3rd International Conference on Information and Computer Technologies.Washington D.C.,USA:IEEE Press,2020:417-423.
[10] PANOV M,TSEPA S.Constructing graph node embeddings via discrimination of similarity distributions[C]//Proceedings of 2018 IEEE International Conference on Data Mining Workshops.Washington D.C.,USA:IEEE Press,2018:1050-1053.
[11] NIKOLENTZOS G,MELADIANOS P,VAZIRGIANNIS M.Matching node embeddings for graph similarity[EB/OL].[2020-07-02].https://www.researchgate.net/profile/Polykarpos_Meladianos/publication/314096064_Matchin g_Node_Embeddings_for_Graph_Similarity/links/58b5523292851ca13e52a0f8/Matching-Node-Embeddings-for-Gra ph-Similarity.pdf.
[12] ZHANG Jing,TANG Jie,MA Cong,et al.Panther:fast Top-k similarity search on large networks[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,USA:ACM Press,2015:1445-1454.
[13] XUE Zhengyuan.Top-k selection theory and its application in graph data processing[D].Wuhan:Huazhong University of Science and Technology,2018.(in Chinese)薛正元.Top-k选择理论及其在图数据处理中的应用研究[D].武汉:华中科技大学,2018.
[14] ZHANG Liangfu,LI Cuiping,CHEN Hong.Research progress of SimRank computation on big graph:a survey[J].Chinese Journal of Computers,2019,42(12):2665-2682.(in Chinese)张良富,李翠平,陈红.大规模图上的SimRank计算研究综述[J].计算机学报,2019,42(12):2665-2682.
[15] MEGHANATHAN N.Unit disk graph-based node similarity index for complex network analysis[J].Complexity,2019,32:1-22.
[16] LI M,CHOUDHURY F M,BOROVICA-GAJIC R,et al.CrashSim:an efficient algorithm for computing SimRank over static and temporal graphs[C]//Proceedings of the 36th International Conference on Data Engineering.Washington D.C.,USA:IEEE Press,2020:1141-1152.
[17] KLAPAUKH R,PEARCE D J,MARSHALL S.Towards a vertex and edge label aware force directed layout algorithm[C]//Proceedings of the 37th Australasian Computer Science Conference.New York,USA:ACM Press,2014:29-37.
[18] HAMEDANI M R,KIM S W.JacSim:an accurate and efficient link-based similarity measure in graphs[J].Information Sciences,2017,414:203-224.
[19] DU Lingxia,LI Cuiping,CHEN Hong,et al.Probabilistic SimRank computation over uncertain graphs[J].Information Sciences,2015,295:521-535.
[20] PRASENJIT D,GOEL K,AGRAWAL R.P-SimRank:extending SimRank to scale-free bipartite networks[C]//Proceedings of the Web Conference 2020.New York,USA:ACM Press,2020:3084-3090.
[21] SALAKEN S M,KHOSRAVI A,NAHAVANDI S.Modification on enhanced Karnik-Mendel algorithm[J].Expert Systems with Applications,2016,65:283-291.
[22] WANG Hanzhi,WEI Zhewei,YUAN Ye,et al.Exact single-source SimRank computation on large graphs[C]//Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data.New York,USA:ACM Press,2020:653-663.
[23] ZHANG Mingxi,HU Hao,HE Zhenying,et al.Top-k similarity search in heterogeneous information networks with x-star network schema[J].Expert Systems with Applications,2015,42(2):699-712.
[24] JIANG Wanchang,WANG Yinghui.Node similarity measure in directed weighted complex network based on node nearest neighbor local network relative weighted entropy[J].IEEE Access,2020,8:32432-32441.

选择文件类型/文献管理软件名称

选择包含的内容

大规模动态图中概率游走约束的节点相似Top-k查询方法

Node Similarity Top-k Query Method with Probabilistic Walk Constraint in Large-Scale Dynamic Graphs

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献

相关文章 1

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

大规模动态图中概率游走约束的节点相似Top-k查询方法

Node Similarity Top-k Query Method with Probabilistic Walk Constraint in Large-Scale Dynamic Graphs

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献

相关文章 1

编辑推荐

Metrics

本文评价