计算机工程 ›› 2019, Vol. 45 ›› Issue (3): 278-285,292.doi: 10.19678/j.issn.1000-3428.0050754

• 开发研究与工程应用 • 上一篇    下一篇

融合时间序列与多尺度特征的虚假评论识别方法

狄瑞彤1a,2,王红1a,1b,2,房有丽1a,2   

  1. 1.山东师范大学 a.信息科学与工程学院;b.生命科学学院,济南 250358; 2.山东省分布式计算机软件新技术重点实验室,济南 250014
  • 收稿日期:2018-03-13 出版日期:2019-03-15 发布日期:2019-03-15
  • 作者简介:狄瑞彤(1993—),女,硕士研究生,主研方向为数据挖掘、机器学习;王红(通信作者),教授、博士;房有丽,硕士研究生。
  • 基金项目:

    国家自然科学基金(61672329,61373149);山东省教育科学规划项目(ZK1437B010)。

Fake Reviews Identification Method Fusing Time Series and Multi-scale Features

DI Ruitong1a,2,WANG Hong1a,1b,2,FANG Youli1a,2   

  1. 1a.School of Information Science and Engineering; 1b.College of Life Science,Shandong Normal University,Jinan 250358,China; 2.Shandong Provincial Key Laboratory of Distributed Computer Software Novel Technology,Jinan 250014,China
  • Received:2018-03-13 Online:2019-03-15 Published:2019-03-15

摘要:

结合时间序列与多尺度特征,提出一种改进的虚假评论识别方法。考虑时间因素对评分及其分布的影响,构建基于多维时间序列的虚假评论识别模型提取异常评论特征,并对异常评论特征进行层次划分,根据多尺度特征思想获取基准尺度特征及细分尺度特征。采用基于密度峰值的聚类算法识别虚假评论,并提高虚假评论识别模型的抗噪能力。实验结果表明,与基于基准尺度特征和多尺度特征的密度峰值聚类虚假评论识别方法相比,该方法的AUC值达到92%,虚假评论识别正确率更高。

关键词: 虚假评论, 时间序列, 多尺度, 主成分分析, 聚类

Abstract:

This paper proposes an improved fake reviews identification method combining time series with multi-scale features.Considering the influence of time factors on the ratings and its distribution,it constructs fake reviews identification model based on multi-dimensional time series to extract abnormal features.It divides abnormal review features into groups,benchmark features and subdivision scale features are extracted according to multi-scale feature idea.To improve the noise immunity of false reviews identification models,it uses a clustering algorithm based on density peaks to identify fake views.Experimental results show that this method has higher identification correct rate of fake reviews and AUC value reach 92% compared with false comment identification method through density peaks clustering based on benchmark scale feature and multi-scale feature.

Key words: fake review, time series, multi-scale, Principal Component Analysis(PCA), clustering

中图分类号: