Design of a Machine Learning-Based Sentiment Analysis Model for Government Weibo

doi:10.19678/j.issn.1000-3428.0068530

Abstract

Abstract:

A machine learning-based sentiment analysis model for government Weibo is proposed to address the challenges posed by cluttered comments and subjective reviews. This model quantitatively analyzes sentiments on government Weibo, providing a reliable foundation for automatic reviews. Using the Weibo of the 2022 Beijing Winter Olympics and the Chinese Football Association as case studies, the methodology begins with the expansion of relevant vocabulary, followed by data cleaning and text feature representation. Subsequently, machine learning models are employed to assess emotional tendencies, and the Chinese sentiment lexicon from the Dalian University of Technology is utilized to calculate emotional intensity. This study employs decision trees, Naïve Bayes, and Support Vector Machine (SVM) models, incorporating both bag-of-words and Word2vec models for sentiment prediction and performance comparison. The experimental results indicate that the SVM model using Word2vec achieves an accuracy of 84.3% in sentiment classification. This demonstrates the effectiveness of the proposed model in predicting sentiments on government Weibo, indicating its potential for automatic review tasks.

Key words: machine learning, government Weibo, sentiment intensity, sentiment analysis, sentiment classification

摘要：

针对政务微博评论杂乱、审核困难的问题, 提出一种基于机器学习的政务微博情感分析模型。该模型能够量化分析政务微博中的情感, 为自动审核提供有效依据。以2022年北京冬奥会和中国足协的微博为例, 首先扩展与案例相关的词汇, 并进行数据清洗和文本特征表示; 然后采用机器学习模型进行情感倾向判断, 并结合大连理工大学中文情感词汇文本计算情感强度。分别采用基于词袋模型和Word2vec模型的决策树、朴素贝叶斯和支持向量机模型进行预测, 并对它们的性能进行对比评估。实验结果表明, 在基于Word2vec的支持向量机模型下, 情感分类的准确率达到84.3%, 这表明所提模型在预测政务微博情感方面具有有效性, 可应用于政务微博自动审核任务。

关键词: 机器学习, 政务微博, 情感强度, 情感分析, 情感分类

ZHANG Cai, MA Ziqiang, YAN Bo. Design of a Machine Learning-Based Sentiment Analysis Model for Government Weibo[J]. Computer Engineering, 2024, 50(12): 386-395.

张财, 马自强, 闫博. 基于机器学习的政务微博情感分析模型设计[J]. 计算机工程, 2024, 50(12): 386-395.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0068530

https://www.ecice06.com/EN/Y2024/V50/I12/386

Figures/Tables 16

Fig.1 Framework of sentiment analysis model

Fig.2 Implementation ideas for Web crawlers

Fig.3 Unclean data (partial)

Fig.4 Data after cleaning (partial)

Fig.5 The difference in word segmentation before and after expanding the corpus

Fig.6 Frequency-length histogram of forward text

Fig.7 Frequency-length histogram of negative text

Fig.8 Comparison view of confusion matrix

References 27

1	中国互联网络信息中心. 中国互联网络发展状况统计报告. 国家图书馆学刊, 2023, 32(2): 1- 39.
	China Internet Network Information Center. Statistical report on the development of Internet in China. Journal of the National Library, 2019, 32(2): 1- 39.
2	戚天梅, 过弋, 王吉祥, 等. 基于机器学习的外汇新闻情感分析. 计算机工程与设计, 2020, 41(6): 1742- 1748.
	QI T M, GUO Y, WANG J X, et al. Sentiment analysis of foreign exchange news based on machine learning. Computer Engineering and Design, 2020, 41(6): 1742- 1748.
3	李吉, 黄微, 郭苏琳, 等. 网络口碑舆情情感强度测度模型研究——基于PAD三维情感模型. 情报学报, 2019, 38(3): 277- 285. doi: 10.3772/j.issn.1000-0135.2019.03.006
	LI J, HUANG W, GUO S L, et al. Research on the sentiment intensity measurement model of Internet word-of-mouth public opinion based on the PAD model. Journal of the China Society for Scientific and Technical Information, 2019, 38(3): 277- 285. doi: 10.3772/j.issn.1000-0135.2019.03.006
4	孔伟俊, 胡广朋. 基于领域词典的网络商品评论情感分析. 计算机与数字工程, 2018, 46(1): 155- 159. doi: 10.3969/j.issn.1672-9722.2018.01.033
	KONG W J, HU G P. Analysis of Internet product reviews which based on field of emotional dictionary. Computer & Digital Engineering, 2018, 46(1): 155- 159. doi: 10.3969/j.issn.1672-9722.2018.01.033
5	HU Y. Text mining and data information analysis for network public opinion. Data Science Journal, 2019, 18, 7. doi: 10.5334/dsj-2019-007
6	吴杰胜, 陆奎. 基于多部情感词典和规则集的中文微博情感分析研究. 计算机应用与软件, 2019, 36(9): 93- 99.
	WU J S, LU K. Chinese Weibo sentiment analysis based on multiple sentiment lexicons and rule sets. Computer Applications and Software, 2019, 36(9): 93- 99.
7	杨廉正, 翟天智. 基于情感词典的视频评论情感倾向分析研究. 网络安全技术与应用, 2022,(3): 53- 56. doi: 10.3969/j.issn.1009-6833.2022.03.030
	YANG L Z, ZHAI T Z. Analysis and research on emotional tendency of video comments based on emotional dictionary. Network Security Technology & Application, 2022,(3): 53- 56. doi: 10.3969/j.issn.1009-6833.2022.03.030
8	刘亚桥, 陆向艳, 邓凯凯, 等. 摄影领域评论情感词典构建方法. 计算机工程与设计, 2019, 40(10): 3037- 3042.
	LIU Y Q, LU X Y, DENG K K, et al. Construction method of sentiment lexicon for photography reviews. Computer Engineering and Design, 2019, 40(10): 3037- 3042.
9	李绍华, 冯晶莹, 张皓泓, 等. 基于支持向量机的微博评论舆情分析. 大学图书情报学刊, 2021, 39(5): 110- 116. doi: 10.3969/j.issn.1006-1525.2021.05.022
	LI S H, FENG J Y, ZHANG H H, et al. Public opinion analysis of microblog comments based on support vector machine. Journal of Academic Library and Information Science, 2021, 39(5): 110- 116. doi: 10.3969/j.issn.1006-1525.2021.05.022
10	栗雨晴, 礼欣, 韩煦, 等. 基于双语词典的微博多类情感分析方法. 电子学报, 2016, 44(9): 2068- 2073. doi: 10.3969/j.issn.0372-2112.2016.09.007
	LI Y Q, LI X, HAN X, et al. A bilingual lexicon-based multi-class semantic orientation analysis for microblogs. Acta Electronica Sinica, 2016, 44(9): 2068- 2073. doi: 10.3969/j.issn.0372-2112.2016.09.007
11	张苑, 祝小兰, 杨东晓. 基于深度学习的疫情情感分析. 智能计算机与应用, 2022, 12(3): 40-45, 52.
	ZHANG Y, ZHU X L, YANG D X. Sentiment analysis of epidemic situation based on deep learning. Intelligent Computer and Applications, 2022, 12(3): 40-45, 52.
12	辛明远, 刘继山. 基于BERTCNN-LDA模型的舆情检测方法——以双减政策为例. 信息与电脑(理论版), 2022, 34(2): 59- 63.
	XIN M Y, LIU J S. Public opinion detection method based on BERTCNN-LDA model——a case study of double reduction policy. Information and Computer(Theoretical Edition), 2022, 34(2): 59- 63.
13	王吉祥, 过弋, 戚天梅, 等. 嵌入互联网舆情强度的人民币汇率预测. 计算机应用, 2019, 39(11): 3403- 3408.
	WANG J X, GUO Y, QI T M, et al. RMB exchange rate prediction based on Internet public opinion intensity. Journal of Computer Applications, 2019, 39(11): 3403- 3408.
14	郑丽娟, 王洪伟. 基于情感本体的在线评论情感极性及强度分析: 以手机为例. 管理工程学报, 2017, 31(2): 47- 54.
	ZHENG L J, WANG H W. Sentimental polarity and strength of online cellphone reviews based on sentiment ontology. Journal of Industrial Engineering and Engineering Management, 2017, 31(2): 47- 54.
15	赵鹏, 何留进, 孙凯, 等. 基于情感计算的网络中文信息分析技术. 计算机技术与发展, 2010, 20(11): 146-149, 173.
	ZHAO P, HE L J, SUN K, et al. Analyzing technologies of Internet Chinese information based on affective computing. Computer Technology and Development, 2010, 20(11): 146-149, 173.
16	李捷, 袁周敏. 基于情感计算的政务微博情绪话语管理研究. 外语教学, 2023, 44(5): 47- 52.
	LI J, YUAN Z M. Research on emotional discourse management of government micro-blog based on emotional computing. Foreign Language Education, 2023, 44(5): 47- 52.
17	RAJIV S, NAVANEETHAN C. An optimal topic centric crawler for acquiring bio-medical themes utilizing Gaussian support vector regression. SN Computer Science, 2023, 4(6): 838.
18	靳宇倡, 邓成龙, 吴平, 等. Emoji图像符号的社交功能及应用. 心理科学进展, 2022, 30(5): 1062- 1077.
	JIN Y C, DENG C L, WU P, et al. Emoji image symbol's social function and application. Advances in Psychological Science, 2022, 30(5): 1062- 1077.
19	BORSOTTI A, BREVEGLIERI L, CRESPI REGHIZZI S, et al. General parsing with regular expression matching. Journal of Computer Languages, 2023, 74, 101176.
20	曾小芹. 基于Python的中文结巴分词技术实现. 信息与电脑, 2019, 31(18): 38-39, 42.
	ZENG X Q. Technology implementation of Chinese Jieba segmentation based on Python. China Computer & Communication, 2019, 31(18): 38-39, 42.
21	万岩, 杜振中. 融合情感词典和语义规则的微博评论细粒度情感分析. 情报探索, 2020, 11(11): 34- 41.
	WAN Y, DU Z Z. Fine-grained sentiment analysis of microblog comments based on fusion of sentiment lexicon and semantic rules. Information Research, 2020, 11(11): 34- 41.
22	XIANG L. Application of an improved TF-IDF method in literary text classification. Advances in Multimedia, 2022, 2022, 9285324.
23	杨欣, 郭建彬. 基于改进TF-IDF的百度百科词语相似度计算. 甘肃科学学报, 2019, 31(2): 143- 147.
	YANG X, GUO J B. Word similarity calculation of Baidu baike terms based on the improved TF-IDF. Journal of Gansu Sciences, 2019, 31(2): 143- 147.
24	MA Y Y, LIU C L, ZHANG J T, et al. Reliability study of stock index forecasting in volatile and trending cities using public sentiment—based on Word2vec and LSTM models. Applied Economics, 2023, 55(43): 5013- 5032.
25	王文韬, 张士豹. 基于情感词典和SVM的微博网民情感分析. 现代信息科技, 2021, 5(24): 24-27, 31.
	WANG W T, ZHANG S B. Emotion analysis of micro-blog netizens based on emotion dictionary and SVM. Modern Information Technology, 2021, 5(24): 24-27, 31.
26	王文静, 郝其宏. 突发事件中政务微博网络舆情治理探究. 国际公关, 2023,(21): 139- 141.
	WANG W J, HAO Q H. Research on the governance of online public opinion in government affairs microblog in emergencies. International Public Relations, 2023,(21): 139- 141.
27	李晶洁, 胡奕阳, 陶然. 基于情感倾向分析的语义韵强度算法探析. 外国语(上海外国语大学学报), 2022, 45(5): 65- 74.
	LI J J, HU Y Y, TAO R. Calculation of semantic prosody strength based on sentiment analysis. Journal of Foreign Languages, 2022, 45(5): 65- 74.

[1]	LI Yongfei, LI Mingyang, CHANG Xin, CAO Kexin. Anomaly Detection of IoT Water Quality Monitoring Data Based on Explainable Deep Learning [J]. Computer Engineering, 2024, 50(6): 179-187.
[2]	WU Xing, YIN Haoyu, YAO Junfeng, LI Weimin, QIAN Quan. Multimodal Sentiment Analysis for Video Data [J]. Computer Engineering, 2024, 50(6): 218-227.
[3]	XU Mingliang, LI Fangyuan, MA Haoran, HE Fei. Spike Sorting Algorithms for Large-Scale Neural Recording [J]. Computer Engineering, 2024, 50(6): 1-34.
[4]	DAI Wei, WANG Fengyu, JI Changpeng. Aspect Level Sentiment Analysis Based on Sentiment-Enhanced and Dual Graph Convolutional Network [J]. Computer Engineering, 2024, 50(5): 120-127.
[5]	YANG Chunxia, WU Yalei, YAN Han, HUANG Yukun. Aspect-Level Sentiment Analysis Model Combining Double Graph Convolution and Gated Linear Unit [J]. Computer Engineering, 2024, 50(4): 141-149.
[6]	Weihuan XIA, Liefa LIAO, Shouxin ZHANG, Yanqin ZHANG. Aspect-Based Sentiment Analysis Based on Aspect-Part-Of-Speech Perception [J]. Computer Engineering, 2024, 50(3): 68-77.
[7]	Yi SUN, Huimei WANG, Ming XIAN, Hang XIANG. Research on Heterogeneous Computing Scheduling Strategy for Kubeflow [J]. Computer Engineering, 2024, 50(2): 25-32.
[8]	Mengmeng CUI, Jingping LIU, Tong RUAN, Yuqiu SONG, Wen DU. Target-Level Implicit Sentiment Classification Based on Dual Multiview Representation [J]. Computer Engineering, 2024, 50(1): 79-90.
[9]	DAI Zuhua, LIU Yuanyuan, DI Shilong. Semantic Enhanced Aspect-Level Text Sentiment Analysis of Graph Neural Networks [J]. Computer Engineering, 2023, 49(6): 71-80.
[10]	CHEN Zhixu, JIN Yanxia, LU Ye, YANG Jing, LIU Yabian, SHI Zhiru. Multi-Precision Clothing Modeling Method Based on Subgraph Convolutional Neural Network [J]. Computer Engineering, 2023, 49(4): 174-181.
[11]	WANG Song, Mairidan Wushouer, Gulanbaier Tuerhong, XUE Yuan. Continual Learning Method for Sentiment Classification Based on Knowledge Architecture [J]. Computer Engineering, 2023, 49(2): 112-118.
[12]	Qilin WU, Yagu DANG, Shanwei XIONG, Xu JI, Kexin BI. Sentiment Analysis Model of Students' Teaching Evaluation Text Based on Hybrid Feature Network [J]. Computer Engineering, 2023, 49(11): 24-29, 39.
[13]	Haiyang YANG, Xingpeng ZHANG. Aspect-Based Sentiment Analysis Model Fusing Multi-Channel Graph Convolutional Network [J]. Computer Engineering, 2023, 49(11): 61-69.
[14]	Huikai JIANG, Xiaoge LI, Xiaochun AN, Tiantian WANG, Heng RUAN. Unlabeled Data Enhanced Position-Aware Network for Aspect-based Sentiment Classification [J]. Computer Engineering, 2023, 49(11): 106-114.
[15]	Bo KONG, Hu HAN, Jingjing CHEN, Xue BAI, Fei DENG. Aspect-Based Sentiment Analysis Through Virtual Dependency and Knowledge Enhancement [J]. Computer Engineering, 2023, 49(10): 53-63.

Please choose a citation manager

Content to export