计算机工程 ›› 2018, Vol. 44 ›› Issue (9): 192-198.doi: 10.19678/j.issn.1000-3428.0048352

• 人工智能及识别技术 • 上一篇    下一篇

基于节点重要性与相似性的重叠社区发现算法

付饶,孟凡荣,邢艳   

  1. 中国矿业大学 计算机科学与技术学院,江苏 徐州 221116
  • 收稿日期:2017-08-14 出版日期:2018-09-15 发布日期:2018-09-15
  • 作者简介:付饶(1992—),男,硕士研究生,主研方向为社区发现与聚类;孟凡荣,教授、博士;邢艳,博士。
  • 基金项目:

    国家重点研发计划项目(2016YFC0600908);国家自然科学基金(61402482,61572505,51674255)。

Overlapping Community Discovery Algorithm Based on Node Importance and Similarity

FU Rao,MENG Fanrong,XING Yan   

  1. School of Computer Science and Technology,China University of Mining Technology,Xuzhou,Jiangsu 221116,China
  • Received:2017-08-14 Online:2018-09-15 Published:2018-09-15

摘要:

在复杂网络中进行重叠社区发现时,现有模糊C均值算法(FCM)采用随机策略导致社区划分结果不一致。为此,提出一种新的重叠社区发现算法。引入节点重要性来量化复杂网络中节点的重要程度,根据节点重要性排序和节点间最短路径选取若干核心节点作为FCM初始的聚类 中心节点,从而提高FCM的不稳定性。利用基于s-跳最短路径的节点相似度量方法得到信息更丰富的相似矩阵,以提高算法的准确率。采用谱聚类对相似矩阵处理得到节点的隶属度矩阵,并依据阈值分配各节点的社区归属。实验结果表明,该算法能够得到唯一的社区划分结果 ,且在Karate、Dolphins数据集上的NMI指标比GCE、INFOMAP和GOPRA等算法高8%以上。

关键词: 模糊C均值, 节点重要性, 最短路径, 社区发现, 谱聚类

Abstract:

To improve the problem that random strategies adopted by Fuzzy C-Means(FCM) and other clustering algorithms lead to the inconsistent results of overlapping community detection in complex network when they are repeatedly run,an overlapping community discovery algorithm is proposed.Node importance is introduced to quantify kernel nodes and some kernel nodes can be chosen by the sequence of node importance and the shortest path between nodes to initialize the cluster centers of FCM,which can improve the instability of FCM.The similarity measure based on the shortest path is provided to obtain the similarity matrix which contains much more information and improve the accuracy of the proposed algorithm.The community affiliations of nodes are determined on the ground of threshold after running modified FCM in the space transformed by spectral clustering.Experimental results show that the proposed algorithm can achieve the only community division result,and it leads more than 8% for the NMI in datasets of the Karate and Dolphins compared with GCE,INFOMAP,COPRA and other algorithms.

Key words: Fuzzy C-Means(FCM), node importance, shortest path, community detection, spectral clustering

中图分类号: