作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (2): 99-105. doi: 10.19678/j.issn.1000-3428.0060194

• 人工智能与模式识别 • 上一篇    下一篇

基于变分自编码器的谣言立场分类算法

郭奉琦, 孟凡荣, 王志晓   

  1. 中国矿业大学 计算机科学与技术学院, 江苏 徐州 221116
  • 收稿日期:2020-12-04 修回日期:2021-01-27 发布日期:2021-01-30
  • 作者简介:郭奉琦(1996-),男,硕士研究生,主研方向为社交网络、谣言立场检测;孟凡荣(通信作者)、王志晓,教授、博士、博士生导师。
  • 基金资助:
    国家自然科学基金(61876186)。

Rumor Stance Classification Algorithm Based on Variational Auto-Encoder

GUO Fengqi, MENG Fanrong, WANG Zhixiao   

  1. College of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
  • Received:2020-12-04 Revised:2021-01-27 Published:2021-01-30

摘要: 针对当前谣言检测任务中社交媒体推特平台的推文数据分布复杂且不均衡的特点,提出基于变分自编码器(VAE)的谣言立场分类算法VAE-LSTM。对数据进行预处理后,利用word2vec模型提取推文词向量并输入VAE中进行训练,得到符合简单概率分布的深度特征序列再从中采样获取有效特征,以避免数据量较大的推文类别影响特征向量。在此基础上,使用长短时记忆(LSTM)网络处理向量序列数据进而实现分类。理论分析和实验结果表明,VAE-LSTM算法无须手动提取或添加特征,训练过程简单高效,同时能缓解类间不平衡问题,其应用于实际场景准确率和F1得分分别为0.800和0.494,与时序注意力机制算法、Turing算法、霍克斯过程算法等相比分类性能更好,且较SVM等早期机器学习方法节省了大量数据预处理时间。

关键词: 变分自编码器, 长短时记忆网络, 社交网络, 谣言立场, 深度特征

Abstract: As a type of social media content, Tweet data in Twitter platform is complex and uneven in distribution, causing difficulties to rumor classification tasks.To address the problem, a Variational Auto-Encoder(VAE)-based algorithm named VAE-LSTM is proposed for rumor stance classification.The data is preprocessed first, and then the word2vec model is used to extract the word vector of the tweet and input it into the VAE for training.In this process, a deep feature sequence that conforms to simple probability distribution is generated, and then sampled to obtain effective features, which can prevent the Tweet category with enormous data from incluencing the feature vector.On this basis, a Long Short-Term Memory(LSTM) network is used to process vector sequence data to implement classification.Results of theoretical analysis and experiments show that the VAE-LSTM algorithm requires no manual intervention in extracting or adding features, making the training process simple and efficient.Furthermore, it can alleviate the imbalance between classes.In actual scenarios, VAE-LSTM displays an accuracy of 0.800 and F1 score of 0.494, outperforming the temporal attention mechanism algorithm, Turing algorithm, and Hawkes Process(HP) algorithm.Furthermore, it saves a lot of data preprocessing time compared with SVM and other early machine learning methods.

Key words: Variational Auto-Encoder(VAE), Long Short-Term Memory(LSTM) network, social network, rumor stance, deep feature

中图分类号: