作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (12): 54-61,70. doi: 10.19678/j.issn.1000-3428.0060167

• 人工智能与模式识别 • 上一篇    下一篇

复杂属性条件下基于Spark的clique社区搜索算法

佘鑫1, 何震瀛2   

  1. 1. 复旦大学 软件学院, 上海 200441;
    2. 复旦大学 计算机科学技术学院, 上海 200441
  • 收稿日期:2020-12-02 修回日期:2021-01-28 发布日期:2021-12-08
  • 作者简介:佘鑫(1996-),男,硕士研究生,主研方向为社区搜索;何震瀛,副教授、博士。
  • 基金资助:
    国家重点研发计划“精准公共法律服务支撑技术与装备研究”(2018YFC0830900)。

Spark-based clique Community Search Algorithm Under Complex Attribute Condition

SHE Xin1, HE Zhenying2   

  1. 1. Software School, Fudan University, Shanghai 200441, China;
    2. School of Computer Science, Fudan University, Shanghai 200441, China
  • Received:2020-12-02 Revised:2021-01-28 Published:2021-12-08

摘要: 现有的社区搜索算法难以在网络中找到满足给定复杂属性条件的社区。同时,随着网络规模的不断扩大,单机串行的社区搜索算法也已无法有效地处理大规模的网络数据。针对复杂属性条件下的clique社区搜索问题,提出一种基于Spark的搜索算法。在Spark并行计算框架的基础上,结合图的结构特征和内容属性,根据由布尔表达式定义的复杂属性条件采取不同的搜索策略,搜索时利用属性的搜索成本和扩展成本进行局部优化,从而加快搜索过程。实验结果表明,与结构优先或属性优先的社区搜索算法相比,该算法在不同属性条件、网络规模和节点数目的情况下均能保证搜索准确性并提高搜索效率。

关键词: 社区搜索, 复杂属性条件, 布尔表达式, Spark并行计算框架, clique结构

Abstract: Existing community search algorithms often fail to find the communities that satisfy the given complex attribute conditions in networks.At the same time, single-machine serial community search algorithms are not capable of processing massive network data generated by scaling networks.To address the problem, this paper proposes a Spark-based community search algorithm under complex attribute condition.The algorithm is constructed by using the parallel computing framework of Spark.Based on the structural features and content attributes of the graph, different search strategies are used according to the complex attribute conditions defined by Boolean expressions.The search cost and extension cost of the attribute are used for partial optimization to speed up the search process.Experimental results show that compared with the proposed structure-first community search algorithm and attribute-first community search algorithm, the proposed algorithm displays a higher search efficiency with the accuracy ensured in the cases of different network scales, numbers of nodes, and attribute conditions.

Key words: community search, complex attribute condition, Boolean expression, Spark parallel computing framework, clique structure

中图分类号: