作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

所属专题: 移动社交专题

• 移动社交专题 • 上一篇    下一篇

基于标签的微博人脉网络挖掘算法和结构分析

王 莎,张连明   

  1. (湖南师范大学物理与信息科学学院,长沙 410081)
  • 收稿日期:2013-08-15 出版日期:2014-05-15 发布日期:2014-05-14
  • 作者简介:王 莎(1988-),女,硕士研究生,主研方向:社会网络;张连明(通讯作者),教授、博士。
  • 基金资助:
    国家自然科学基金资助项目(60973129);广东省自然科学基金资助项目(S2011010000812)。

Mining Algorithm and Structural Analysis of Microblog Interpersonal Relationship Network Based on Tag

WANG Sha, ZHANG Lian-ming   

  1. (College of Physics and Information Science, Hunan Normal University, Changsha 410081, China)
  • Received:2013-08-15 Online:2014-05-15 Published:2014-05-14

摘要: 针对互联网微博业务的广泛应用及其对大数据挖掘和分析的影响,提出一种基于标签的微博人脉网络挖掘算法。分析该网络的结构特征,利用微博用户标签,在模糊匹配过程中计算词语之间的匹配度时,主要考虑词语语素、次序和词长3个因素。为弱化以不同用户为起点对算法准确率的影响,分别以普通用户和名人用户为起点用户,挖掘微博人脉网络数据。同时,研究微博人脉网络的结构特性,通过分析发现微博人脉网络同时具有小世界和无标度特性。实验结果表明,运用该算法对名人用户和普通用户朋友中对IT感兴趣的人进行挖掘的误差率是可接受的。其中,挖掘10个名人用户朋友时算法的平均误差率为14.08%,挖掘10个普通用户朋友时算法的平均误差率为10.63%。

关键词: 标签, 微博, 人脉网络, 模糊匹配, 数据挖掘, 结构特征

Abstract: For the widespread use of microblog business and the impact on data mining techniques, a mining algorithm of microblog interpersonal relationship network is proposed based on the fuzzy matching of tag, and the characteristics of the network are analyzed. Use the tag of the users, the algorithm mainly considers word morpheme, order, and word length to calculate the match degree of the words when matching the tag. For weakening the influence that using different users as a starting point may have different result, ordinary users and celebrities as a starting point separately are used. At the same time, the structural characteristics of the network are studied, and the analysis results show that the network has small-world and scale-free properties. The results show that the mining error rate of celebrities and common users friends who are interested in IT. When mining 10 celebrity users’ friends, the average error rate of the algorithm is 14.08%, and 10.63% for common users.

Key words: tag, microblog, interpersonal relationship network, fuzzy matching, data mining, structural characteristics

中图分类号: