复杂属性条件下基于Spark的clique社区搜索算法

doi:10.19678/j.issn.1000-3428.0060167

摘要/Abstract

摘要： 现有的社区搜索算法难以在网络中找到满足给定复杂属性条件的社区。同时，随着网络规模的不断扩大，单机串行的社区搜索算法也已无法有效地处理大规模的网络数据。针对复杂属性条件下的clique社区搜索问题，提出一种基于Spark的搜索算法。在Spark并行计算框架的基础上，结合图的结构特征和内容属性，根据由布尔表达式定义的复杂属性条件采取不同的搜索策略，搜索时利用属性的搜索成本和扩展成本进行局部优化，从而加快搜索过程。实验结果表明，与结构优先或属性优先的社区搜索算法相比，该算法在不同属性条件、网络规模和节点数目的情况下均能保证搜索准确性并提高搜索效率。

关键词: 社区搜索, 复杂属性条件, 布尔表达式, Spark并行计算框架, clique结构

Abstract: Existing community search algorithms often fail to find the communities that satisfy the given complex attribute conditions in networks.At the same time, single-machine serial community search algorithms are not capable of processing massive network data generated by scaling networks.To address the problem, this paper proposes a Spark-based community search algorithm under complex attribute condition.The algorithm is constructed by using the parallel computing framework of Spark.Based on the structural features and content attributes of the graph, different search strategies are used according to the complex attribute conditions defined by Boolean expressions.The search cost and extension cost of the attribute are used for partial optimization to speed up the search process.Experimental results show that compared with the proposed structure-first community search algorithm and attribute-first community search algorithm, the proposed algorithm displays a higher search efficiency with the accuracy ensured in the cases of different network scales, numbers of nodes, and attribute conditions.

Key words: community search, complex attribute condition, Boolean expression, Spark parallel computing framework, clique structure

中图分类号:

TP18

佘鑫, 何震瀛. 复杂属性条件下基于Spark的clique社区搜索算法[J]. 计算机工程, 2021, 47(12): 54-61,70.

SHE Xin, HE Zhenying. Spark-based clique Community Search Algorithm Under Complex Attribute Condition[J]. Computer Engineering, 2021, 47(12): 54-61,70.

https://www.ecice06.com/CN/Y2021/V47/I12/54

图/表 10

20211213181541

20211213181544

20211213181548

20211213181551

20211213181554

20211213181557

20211213181600

20211213181603

20211213181606

20211213181609

参考文献

[1] 付饶, 孟凡荣, 邢艳.基于节点重要性与相似性的重叠社区发现算法[J].计算机工程, 2018, 44(9):192-198. FU R, MENG F R, XING Y.Overlapping community discovery algorithm based on node importance and similarity[J].Computer Engineering, 2018, 44(9):192-198.(in Chinese)
[2] FANG Y, HUANG X, QIN L, et al.A survey of community search over big graphs[J].The VLDB Journal, 2020, 29(1):353-392.
[3] ALESSANDRO A, RALPH G.Imagined communities:awareness, information sharing, and privacy on the Facebook[C]//Proceedings of International Workshop on Privacy Enhancing Technologies.Berlin, Germany:Springer, 2006:36-58.
[4] BRODER A, KUMAR R, RAGHAVAN P, et al.Graph structure in the Web[J].Computer Networks, 2000, 33(1):309-320.
[5] PALLA G, IMRE D, ILLÉS F, et al.Uncovering the overlapping community structure of complex networks in nature and society[J].Nature, 2005, 435(7043):814-818.
[6] ROGER G, LUÍS A, NUNES A.Functional cartography of complex metabolic networks[J].Nature, 2005, 433(7028):895-900.
[7] 卢志刚, 吴露.ESN中基于贪婪派系扩张的重叠社区发现[J].计算机工程, 2019, 45(7):32-40. LU Z G, WU L.Overlapping community discovery based on greedy factional expansion in ESN[J].Computer Engineering, 2019, 45(7):32-40.(in Chinese)
[8] STEPHEN B S.Network structure and minimum degree[J].Social Networks, 1983, 5(3):269-287.
[9] SOZIO M, GIONIS A.The community-search problem and how to plan a successful cocktail party[C]//Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York, USA:ACM Press, 2010:939.
[10] 竺俊超, 王朝坤.复杂条件下的社区搜索方法[J].软件学报, 2019, 30(3):552-572. ZHU J C, WANG C K.Approaches to community search under complex conditions[J].Journal of Software, 2019, 30(3):552-572.(in Chinese)
[11] COHEN J.Trusses:cohesive subgraphs for social network analysis[J].National Security Agency Technical Report, 2008, 16(8):3-29.
[12] HUANG X, CHENG H, QIN L, et al.Querying k-truss community in large and dynamic graphs[C]//Proceedings of ACM SIGMOD International Conference on Management of Data.New York, USA:ACM Press, 2014:1311-1322.
[13] FANG Y, CHENG R, LUO S, et al.Effective community search for large attributed graphs[J].Proceedings of the VLDB Endowment, 2016, 9(12):1233-1244.
[14] HUANG X, LAKSHMANAN L V S.Attribute-driven community search[J].Proceedings of the VLDB Endowment, 2017, 10(9):949-960.
[15] ZHU Y, ZHANG Q, QIN L, et al.Querying cohesive subgraphs by keywords[C]//Proceedings of International Conference on Data Engineering.Washington D.C., USA:IEEE Press, 2018:1324-1327.
[16] ZHANG Z, HUANG X, XU J, et al.Keyword-centric community search[C]//Proceedings of International Conference on Data Engineering.Washington D.C., USA:IEEE Press, 2019:422-433.
[17] ZHU Y, HE J, YE J, et al.When structure meets keywords:cohesive attributed community search[C]//Proceedings of International Conference on Information and Knowledge Management.Washington D.C., USA:IEEE Press, 2020:1913-1922.
[18] CHOWDHARY A A, LIU C, CHEN L, et al.Finding attribute diversified communities in complex networks[C]//Proceedings of International Conference on Database Systems for Advanced Applications.Berlin, Germany:Springer, 2020:19-35.
[19] TOMITA E, TANAKA A, TAKAHASHI H.The worst-case time complexity for generating all maximal cliques and computational experiments[J].Theoretical Computer Science, 2006, 363(1):28-42.
[20] BRON C, KERBOSCH J.Algorithm 457:finding all cliques of an undirected graph[J].Communications of the ACM, 1973, 16(9):575-576.
[21] RASMUSSEN L E.Approximately counting cliques[J].Random Structures & Algorithms, 1997, 11(4):395-411.
[22] JAIN S, SESHADHRI C.A fast and provable method for estimating clique counts using Turán's theorem[C]//Proceedings of International Conference on World Wide Web.[S.l.]:International World Wide Web Conferences Steering Committee, 2017:441-449.
[23] AFRATI F N, FOTAKIS D, ULLMAN J D.Enumerating subgraph instances using map-reduce[C]//Proceedings of International Conference on Data Engineering.Washington D.C., USA:IEEE Press, 2013:62-73.
[24] ELMASRY A, KHALAFALLAH A, MESHRY M.A scalable maximum-clique algorithm using Apache Spark[C]//Proceedings of IEEE/ACS International Conference of Computer Systems and Applications.Washington D.C., USA:IEEE Press, 2016:1-8.
[25] SILVA J P M, SAKALLAH K A.GRASP-a new search algorithm for satisfiability[M].Berlin, Germany:Springer, 2003.
[26] QUINE W V.The problem of simplifying truth functions[J].The American Mathematical Monthly, 1952, 59(8):521-531.

选择文件类型/文献管理软件名称

选择包含的内容