计算机工程 ›› 2019, Vol. 45 ›› Issue (7): 6-12.doi: 10.19678/j.issn.1000-3428.0052048

• 先进计算与数据处理 • 上一篇    下一篇

一种高效的分布式水军群组检测算法

张璐1, 朱海婷2   

  1. 1. 南京财经大学 信息工程学院, 南京 210046;
    2. 南京邮电大学 物联网学院, 南京 210023
  • 收稿日期:2018-07-09 修回日期:2018-08-29 出版日期:2019-07-15 发布日期:2019-07-23
  • 作者简介:张璐(1983-),男,讲师、博士,主研方向为数据挖掘、分布式计算;朱海婷(通信作者),讲师、博士。
  • 基金项目:
    国家重点研发计划(2017YFD0401002);国家自然科学基金(71801123,91646204,61502250);南京邮电大学引进人才科研启动基金(NY214188)。

An Efficient Distributed Detection Algorithm for Spammer Group

ZHANG Lu1, ZHU Haiting2   

  1. 1. College of Information Engineering, Nanjing University of Finance and Economics, Nanjing 210046, China;
    2. College of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
  • Received:2018-07-09 Revised:2018-08-29 Online:2019-07-15 Published:2019-07-23

摘要: 为在电子商务水军群组检测中快速处理真实环境中的大规模用户数据,提出一种分布式水军群组检测算法。设计基于余弦模式挖掘的候选群组提取算法,通过余弦相似度衡量群组成员间的耦合性,以精准提取候选群组并降低后续识别的计算量。结合组投影技术与Spark计算框架,提出一种分布式群组提取算法,从而提升群组检测的运行速度。在真实数据集上的实验与案例研究结果表明,该算法能够保证检测准确率,且具有较高的运行效率。

关键词: 水军群组检测, 检测效率, 余弦模式, 紧耦合群组, 组投影, 分布式计算框架

Abstract: In order to quickly process large-scale user data in real environment in e-commerce spammer group detection,a distributed detection algorithm for spammer group is proposed.A candidate group extraction algorithm based on cosine pattern mining is designed to measure the coupling between group members by cosine similarity,so as to extract candidate groups accurately and reduce the computational complexity of subsequent recognition.Combining group projection technology with the Spark computing framework,a distributed group extraction algorithm is proposed to further improve the speed of group detection.Results of experiments and case studies on real data sets show that the proposed algorithm can guarantee the detection accuracy and has high efficiency.

Key words: spammer group detection, detection efficiency, cosine pattern, tightly-coupled group, group projection, distributed computing framework

中图分类号: