作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (8): 292-298,305. doi: 10.19678/j.issn.1000-3428.0062008

• 开发研究与工程应用 • 上一篇    下一篇

基于RoBERTa-WWM的大学生论坛情感分析模型

王曙燕, 原柯   

  1. 西安邮电大学 计算机学院, 西安 710121
  • 收稿日期:2021-07-12 修回日期:2021-09-26 发布日期:2022-08-09
  • 作者简介:王曙燕(1964-),女,教授、博士,主研方向为软件测试、数据挖掘、智能信息处理;原柯,硕士研究生。
  • 基金资助:
    陕西省重点研发计划(2020GY-010);西安市科技计划项目(2019218114GXRC017CG018-GXYD17.10)。

Sentiment Analysis Model of College Student Forum Based on RoBERTa-WWM

WANG Shuyan, YUAN Ke   

  1. School of Computer Science and Technology, Xi' an University of Posts and Telecommunications, Xi' an 710121, China
  • Received:2021-07-12 Revised:2021-09-26 Published:2022-08-09

摘要: 大学生论坛语句具有篇幅短、口语化、多流行用语等特点,传统的情感分析模型难以对其进行精准的语义特征表示,并且未充分关注语句的局部特征与上下文语境。提出一种基于RoBERTa-WWM的大学生情感分析模型。通过RoBERTa-WWM模型将论坛文本语句转化为语义特征表示,并将其输入到文本卷积神经网络中,以提取语句的局部语义特征,同时利用双向门控循环单元网络对局部语义特征进行双向处理,获得全面的上下文语义信息。在此基础上,通过Softmax分类器计算语句在情感标签中的概率向量,选择最大值表示的情感标签作为最终输出的情感标签。实验结果表明,相比RoBERTa-WWM、EK-INIT-CNN、BERT等模型,该模型在大学生论坛与NLPCC2014数据集上具有较优的分类性能,并且在大学生论坛数据集上宏平均精准率、宏平均召回率、宏平均F1值和微平均F1值分别为89.43%、90.43%、90.12%和92.48%。

关键词: 深度学习, 大学生情感分析, RoBERTa-WWM模型, 文本卷积神经网络, 双向门控循环单元网络

Abstract: College student forum have the features of short length, colloquialism, and highly popular language.Traditional sentiment analysis models cannot accurately represent the semantic features of college student forum, and fail to adequately consider the local features and context of the sentences.For that reason, sentiment analysis model of college student based on RoBERTa-WWM is proposed.The RoBERTa-WWM model is used to convert forum text sentences into semantic feature representations, and the obtained semantic feature representations are input into the Text Convolutional Neural Network (TextCNN) to extract the local semantic feature information of the sentences.Simultaneously, the local semantic features are processed bidirectionally by using the Bidirectional Gated Recurrent Unit (BiGRU) network to obtain comprehensive context semantic information.On this basis, the probability vector of the sentence in the sentiment tag is calculated using the Softmax classifier, and the sentiment tag represented by the maximum value is selected as the final output sentiment tag.The experimental results show that compared with models such as RoBERTa-WWM EK-INIT-CNN, and BERT, the proposed model has better classification performance on college student forum and NLPCC2014 datasets.In particular, the Macro average accuracy, Macro average recall, Macro average F1 value, and Micro average F1 value on college student forum dataset are 89.43%, 90.43%, 90.12%, and 92.48%, respectively.

Key words: Deep Learning(DL), sentiment analysis of college student, RoBERTa-WWM model, Text Convolutional Neural Network(TextCNN), Bidirectional Gated Recurrent Unit(BiGRU) network

中图分类号: