Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2022, Vol. 48 ›› Issue (6): 278-287,294. doi: 10.19678/j.issn.1000-3428.0061490

• Development Research and Engineering Application • Previous Articles     Next Articles

Research on Tourist Portrait Based on Joint Topic-Sentiment Analysis

LI Qin1, LI Shaobo2, HU Jie1   

  1. 1. College of Big Data Statistics, Guizhou University of Finance and Economics, Guiyang, 550000, China;
    2. School of Mechanical Engineering, Guizhou University, Guiyang, 550000, China
  • Received:2021-04-27 Revised:2021-07-19 Published:2022-06-11

基于主题情感联合分析的游客画像研究

李琴1, 李少波2, 胡杰1   

  1. 1. 贵州财经大学 大数据统计学院, 贵阳 550000;
    2. 贵州大学 机械工程学院, 贵阳 550000
  • 作者简介:李琴(1985—),女,讲师、博士,主研方向为自然语言处理、旅游大数据分析;李少波,教授、博士;胡杰,讲师、博士。
  • 基金资助:
    贵州省科技计划项目(黔科合基础-ZK[2021]337);贵州省教育厅青年科技人才成长项目(黔教合KY字[2021]141);贵州财经大学引进人才科研启动项目(2021YJ003)。

Abstract: As the carrier of modern tourists' perception and expression of views, network text has become an important data source for the construction and analysis of tourist portrait.The existing natural language processing technology focuses on the needs and emotions of tourist portraits, and lacks an effective connection between technology and tourism applications.However, in the existing text mining technology, the topic and sentiment of text are usually separated and analyzed, show a lack of mutual directivity, and cannot effectively extract users' fine-grained opinions.A supervised joint topic-sentiment analysis model based on Variational Auto-Encoders(AVEs), is proposed.The word frequency weight is introduced into the prior knowledge, and the variable parameters are constructed by Gaussian Stick-Breaking model to effectively capture the correlation in the discrete data.The sentiment label is used to assist the topic training and generation, to improve the accuracy of topic mining and emotion prediction.The posterior distribution of the Bayesian topic model is calculated using the AVEs model, and the sentiment classification prediction under topic distribution is used to realize the joint topic-sentiment analysis.The experimental results show that the average accuracy of this model is about 85% when the number of topics is 10~100.Compared with LDA, SAGE and NVDM models, this model can effectively mine the characteristics of hotel user comments.

Key words: tourist portrait, Variational Auto-Encoders(VAEs), joint topic-sentiment analysis, opinion mining, Latent Dirichlet Allocation(LDA) model

摘要: 网络文本作为现代游客承载感知和表达观点的载体,已成为游客画像构建与分析的重要数据来源。现有的自然语言处理技术在游客画像的挖掘过程中主要关注游客的需求和情感,缺少技术与旅游应用的有效衔接,然而现有的文本挖掘技术中文本的主题和情感通常被割裂分析,缺乏相互指向性,无法有效提取用户细粒度的意见。提出一种基于变分自编码的有监督主题情感联合分析模型。将词频权重引入到先验知识中,同时通过截断高斯模型构造变参数,有效捕获离散数据中的相关性,利用情感标签辅助主题的训练和生成,以提升主题挖掘及情感预测的准确率。通过变分自编码模型计算贝叶斯主题模型的后验分布,采用主题分布下的情感分类预测实现主题情感的联合分析。实验结果表明,当主题数为10~100时,该模型的情感预测平均准确率约为85%,相比LDA、SAGE、NVDM模型,能够有效挖掘酒店用户评论的特征。

关键词: 游客画像, 变分自编码器, 主题情感联合分析, 意见挖掘, 隐含狄利克雷分布模型

CLC Number: