作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

一种基于局部相似性的社区发现算法

吴钟刚,吕钊   

  1. (华东师范大学 计算机科学技术系,上海 200241)
  • 收稿日期:2015-11-17 出版日期:2016-12-15 发布日期:2016-12-15
  • 作者简介:吴钟刚(1990—),男,硕士研究生,主研方向为大数据分析;吕钊(通讯作者),副教授。
  • 基金资助:
    上海市科学技术委员会科研计划项目(1451110700,14511106803);上海市张江国家自主创新示范区专项发展资金(201411-JA-B108-002)。

A Community Detection Algorithm Based on Local Similarity

WU Zhonggang,Lü Zhao   

  1. (Department of Computer Science and Technology,East China Normal University,Shanghai 200241,China)
  • Received:2015-11-17 Online:2016-12-15 Published:2016-12-15

摘要: 现有社区发现算法大多仅考虑图的拓扑结构或节点的属性信息,其中结合两者的属性图聚类算法挖掘效果不理想,而基于共享邻居的局部相似性算法却未充分度量节点间相似度。针对上述问题,提出一种新的基于局部相似性的节点相似度社区发现算法。该算法包括基于局部相似性的节点相似度计算和节点聚类2个模块。利用Pagerank算法计算基于图链接结构的节点重要性,衡量节点间的链接强度并结合节点的属性得到节点对相似性。为避免基于共享邻居的节点相似度计算的低估倾向,引入邻居节点集之间的相似度作为节点局部相似性。采用K-Medoids聚类算法将节点与分组中心节点的局部相似性值作为节点类别归属的判断,得到社区划分的结果。实验结果表明,与经典SA-Cluster与k-SNAP等算法相比,该算法能挖掘出质量更高的社区,具有较好的社区划分效果。

关键词: 社区发现, 图聚类, 属性图, 节点重要性, 局部相似性, 节点相似度

Abstract: Many existing community detection algorithms focus on topological structure or node attributes.Some attribut graph clustering algorithms consider both of them but the quality of community is not good.Shared neighbors based local similarity algorithms underestimate pairwise of node similarity.Hence,this paper proposes a new Local Similarity based Community Detection(LS-CD)algorithm.The proposed algorithm contains two main components:node local similarity calculation and node clustering.It evaluates the vertex importance using the Pagerank algorithm and calculates the similarity of pairwise vertexes by combining connetion strength and node attribute.To avoid underestimating node similarity based on shared neighbors,the similarity of vertexes is calculated by the similarity of their local neighborhoods.The K-Medoids clustering algorithm is used to identify community by measuring the local similarity of node and cluster centroid.Experimental results show that,compared with traditional SA-Cluster and k-SNAP algorithms,this algorithm can mine high quality community and has good community identification effect.

Key words: community detection, graph clustering, attributed graph, node importance, local similarity, node similarity

中图分类号: