作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 先进计算与数据处理 • 上一篇    下一篇

基于特征分析的微博炒作账户识别方法

张 进,刘 琰,罗军勇,董雨辰   

  1. (数学工程与先进计算国家重点实验室,郑州450002)
  • 收稿日期:2014-04-18 出版日期:2015-04-15 发布日期:2015-04-15
  • 作者简介:张 进(1989 - ),男,硕士研究生,主研方向:数据挖掘,网络信息安全;刘 琰,副教授;罗军勇,教授;董雨辰,硕士研究生。
  • 基金资助:
    国家自然科学基金资助项目(61309007);国家“863”计划基金资助项目(2012AA012902);国家科技支撑计划基金资助项目 (2012BAH47B01)。

Identification Method of Microblog Hype Account Based on Feature Analysis

ZHANG Jin,LIU Yan,LUO Junyong,DONG Yuchen   

  1. (State Key Laboratory of Mathematical Engineering and Advanced Computing,Zhengzhou 450002,China)
  • Received:2014-04-18 Online:2015-04-15 Published:2015-04-15

摘要: 近年来微博中出现一些炒作账户采用违规手段开展网络公关活动,严重扰乱了正常的互联网秩序,然而传统炒作账户识别方法主要依靠人工分析,其效率低下且不适用于对海量账户进行识别。针对上述问题,提出一种改进的微博炒作账户识别方法,从账户状态、历史微博以及账户邻居3 个方面对炒作账户的特征进行分析,构建炒作账户特征集,并利用数据挖掘中的朴素贝叶斯、支持向量机及K 最近邻分类等算法对正常账户和炒作账户进行自动分类。实验结果表明,该方法能有效识别微博中的炒作账户,准确率高达95% 。

关键词: 微博, 炒作账户, 特征分析, 特征选择, 数据挖掘, 分类算法

Abstract: In recent years,there are some hype accounts in microblogs,they use illegal means to carry out the network public relations activities,which seriously disturbs the normal order of the Internet. The traditional detection of hype accounts mainly use methods based on manual analysis,which is inefficient and not suitable for detection of massive accounts. In view of the above problems,a method based on feature analysis for the detection of hype accounts is proposed. The features of hype accounts are analyzed from many angles,and a features database is constructed in this method,then the hype accounts are automatic classification by using several classification algorithms in data mining, including Naive Bayes(NB),Support Vector Machine(SVM) and K Nearest Neighbor(KNN),et al. Experimental result shows that this method is suitable for the detection of most hype accounts,with the accuracy rate of 95% .

Key words: microblog, hype account, feature analysis, feature selection, data mining, classification algorithm

中图分类号: