计算机工程 ›› 2017, Vol. 43 ›› Issue (12): 160-164,172.doi: 10.3969/j.issn.1000-3428.2017.12.030

• 人工智能及识别技术 • 上一篇    下一篇

基于多特征的微博情感分析研究

刘续乐 a,何炎祥 a,b   

  1. (武汉大学 a.计算机学院; b.软件工程国家重点实验室,武汉 430072)
  • 收稿日期:2016-11-22 出版日期:2017-12-15 发布日期:2017-12-15
  • 作者简介:刘续乐(1992—),男,硕士研究生,主研方向为自然语言处理、情感分析;何炎祥,教授、博士生导师。

Research on Microblog Sentiment Analysis Based on Multi-feature

LIU Xule a,HE Yanxiang a,b   

  1. (a.School of Computer; b.State Key Laboratory of Software Engineering,Wuhan University,Wuhan 430072,China)
  • Received:2016-11-22 Online:2017-12-15 Published:2017-12-15

摘要: 为提高微博情感分类识别的正确率,以网络微博数据作为研究对象,提出一种基于图的情感基准词选择方法。结合知网相似度知识,构建图模型,以图中节点中介性的值为依据,选择出高质量和高覆盖率的情感基准词。根据得到的基准词构建情感分析中所需的情感词典,并给出情感词极性。同时将情感词应用于挖掘短句情感特征,加入到传统支持向量机(SVM)模型中,对微博句子挖掘更多的语义信息从而获取更合理的语义合成函数,捕捉句子情感变化以更好地把握微博整句情感。采用具有特征约束特性的条件随机场(CRF)模型对短句进行分类。实验结果验证了CRF模型短句分类的有效性,与多种特征的SVM分类方法相比,在不同数据集上具有更好的分类效果。

关键词: 微博, 情感词, 节点中介性, 情感分析, 机器学习

Abstract: In order to improve the accuracy of micro-blog emotional classification recognition,regarding the network microblog data as research object,this paper proposes a choice method of emotional basic word based on graph.Combined with similarity knowledge of HowNet,the method builds a graph model to choose high quality and high coverage emotion basic words according to node betweenness centrality in graph.It builds emotional dictionary for sentiment analysis according to selected basic words.The polarity of emotional words is also given.The emotional words are applied to mine short sentence emotional features.Those features will join into traditional Support Vector Machine(SVM) model.More semantic information is mined on micro-blog sentences to obtain a more reasonable semantic composition function.The sentence emotional changes are captured to better grasp the micro-blog emotion of whole sentence.Conditional Random Field(CRF) model that has characteristics of feature constraint is used to classify short sentences.Experimental results verify the effectiveness of CRF model on short sentences.Compared with SVM classification methods with different features,it also has a greater effect on different data sets.

Key words: microblog, emotional word, node betweenness centrality, sentiment analysis, machine learning

中图分类号: