作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (10): 55-66. doi: 10.19678/j.issn.1000-3428.0062639

• 人工智能与模式识别 • 上一篇    下一篇

基于贪婪策略的紧密k核子图查询

赵丹枫1, 姚贤标1, 包晓光1, 黄冬梅2, 郭伟其3   

  1. 1. 上海海洋大学 信息学院, 上海 201306;
    2. 上海电力大学 电子与信息工程学院, 上海 200090;
    3. 国家海洋局 东海海洋环境调查勘察中心, 上海 200137
  • 收稿日期:2021-09-08 修回日期:2021-11-01 发布日期:2022-10-09
  • 作者简介:赵丹枫(1982—),女,讲师、博士,主研方向为图计算、大数据技术;姚贤标,硕士研究生;包晓光,讲师、博士;黄冬梅(通信作者)、郭伟其,教授。
  • 基金资助:
    国家自然科学基金青年科学基金项目(42106190);上海市科委地方能力建设项目(20050501900)。

Closely Related k-Core Subgraph Query Based on Greedy Strategy

ZHAO Danfeng1, YAO Xianbiao1, BAO Xiaoguang1, HUANG Dongmei2, GUO Weiqi3   

  1. 1. College of Information Technology, Shanghai Ocean University, Shanghai 201306, China;
    2. College of Electronics and Information Engineering, Shanghai University of Electric Power, Shanghai 200090, China;
    3. East China Sea Marine Environment Survey and Survey Center, State Oceanic Administration, Shanghai 200137, China
  • Received:2021-09-08 Revised:2021-11-01 Published:2022-10-09

摘要: k核查询是一种社团查询,由于其可以在线性时间内被有效计算,因此在社团检测中具有较广泛的应用。图中边的权值在很多场景下具有较强的语义关系,但现有研究较少考虑图中边的权值。为提升k核查询的效率,在k核的基础上定义加权图中的紧密k核子图查询(CRKSQ)问题,并使用归约方法证明该问题是NP-难的。基于贪婪策略设计启发式算法CRK-G,通过迭代删除节点为CRKSQ问题找到一个近似解。在此基础上,从降低图规模和减少迭代次数两方面研究CRK-G算法的优化策略,分别提出使用图压缩策略的算法CRK-C及使用单次多节点删除策略的算法CRK-F。在Bio-GRID、Email-Enron、DBLP 3个数据集上的实验结果表明,相对于CRK-G算法,CRK-C、CRK-F算法在查询速度上有较大的提升,且平均误差均在8%以内。

关键词: 社团检测, k核, 加权图, 紧密子图, 贪婪策略

Abstract: k-core query is a type of community query because it can be calculated effectively in linear time and has a wide range of applications in community detection.In some scenarios, the weights of edges in graphs exhibit strong semantics, but the weights of edges in graphs were rarely considered in previous studies.First, the Closely Related k-core Subgraph Query(CRKSQ) problem in weighted graphs is defined based on the k-core, and the problem is confirmed to be Non-deterministic Polynomial-time hard(NP-hard) using the reduction method.Second, a heuristic algorithm CRK-G is designed based on the greedy strategy to obtain an approximate solution to the CRKSQ problem by iteratively deleting nodes.Finally, the optimization strategy of the CRK-G algorithm is evaluated based on two aspects:reducing the graph size and the number of iterations.This study proposes a CRK-C algorithm using the graph compression strategy and a CRK-F algorithm using the single-time multinode deletion strategy.The experimental results for three datasets (Bio-GRID, Email-Enron, and DBLP) show that compared with the CRK-G algorithm, the CRK-C and CRK-F algorithms exhibited significantly improved query speed, and the average error is within 8%.

Key words: community detection, k-core, weighted graph, closely related subgraph, greedy strategy

中图分类号: