计算机工程 ›› 2020, Vol. 46 ›› Issue (11): 61-69.doi: 10.19678/j.issn.1000-3428.0056187

• 人工智能与模式识别 • 上一篇    下一篇

融合Spark与隐性兴趣的用户综合影响力度量

童曼琪1a, 黄江升2, 郭昆1b   

  1. 1. 福州大学 a. 福建省空间数据挖掘与信息共享教育部重点实验室;b. 福建省网络计算与智能信息处理重点实验室, 福州 350002;
    2. 国网信通亿力科技有限责任公司, 福州 350003
  • 收稿日期:2019-10-08 修回日期:2019-11-08 发布日期:2019-11-12
  • 作者简介:童曼琪(1993-),女,硕士研究生,主研方向为数据挖掘;黄江升,工程师;郭昆(通信作者),副教授、博士。
  • 基金项目:
    国家自然科学基金(61300104);福建省高等学校新世纪优秀人才支持计划(JA13021);福建省杰出青年科学基金(2015J06014);福建省高校产学合作项目(2017H6008)。

Comprehensive User Influence Measurement Combining Spark and Recessive Interest

TONG Manqi1a, HUANG Jiangsheng2, GUO Kun1b   

  1. 1a. Fujian Provincial Key Laboratory of Spatial Data Mining and Information Sharing, Ministry of Education;1b. Fujian Provincial Key Laboratory of Network Computing and Intelligent Information Processing, Fuzhou University, Fuzhou 350002, China;
    2. State Grid Info-Telecom Great Power Science and Technology Co., Ltd., Fuzhou 350003, China
  • Received:2019-10-08 Revised:2019-11-08 Published:2019-11-12

摘要: 为解决传统用户影响力度量算法面向海量数据处理时运行速度下降的问题,提出一种基于隐性兴趣的用户综合影响力度量算法。通过隐含狄利克雷分配模型得到用户隐性兴趣偏好,根据困惑度和平均话题相似度综合确定最优兴趣话题数,并改进PageRank算法的用户兴趣传播转移率获得用户隐性兴趣传播影响力。在Spark计算框架的基础上,采用层次分析法且结合用户自身影响力和用户隐性兴趣传播影响力,计算得到最终用户影响力。实验结果表明,该算法综合考虑用户兴趣和用户自身影响因素,能够更客观高效地评估用户的真实影响力。

关键词: 用户影响力, 用户兴趣相似度, PageRank算法, Spark计算框架, 隐含狄利克雷分配模型

Abstract: The speed of traditional user influence measurement algorithms is reduced when dealing with massive data.To address the problem,this paper proposes a comprehensive user influence measurement algorithm based on recessive interest.The Latent Dirichlet Allocation(LDA) model is used to obtain the recessive interests of the user,and the number of the optimal interest topics is determined based on the perplexity and the average topic similarity.Then,the transmission rate of user interests in the PageRank algorithm is improved to obtain the User Interest Factor(UIF).Finally,based on the Spark computing framework,the Analytic Hierarchy Process(AHP) is used to calculate the ultimate user influence by combining the influence of the user and UIF.Experimental results show that the proposed algorithm has a holistic consideration on user interests and the influence factors of the user,which enables it to provide more efficient and reasonable evaluation of the real influence of the user.

Key words: user influence, user interest similarity, PageRank algorithm, Spark computing framework, Latent Dirichlet Allocation(LDA) model

中图分类号: