作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于AWCRF模型的微博情感倾向分类方法

陈炳丰 1,郝志峰 1,2,蔡瑞初 1,温雯 1,梁礼欣 1   

  1. (1.广东工业大学 计算机学院,广州 510006; 2.佛山科学技术学院 数学与大数据学院,广东 佛山 528000)
  • 收稿日期:2016-10-24 出版日期:2017-07-15 发布日期:2017-07-15
  • 作者简介:陈炳丰(1983—),男,博士研究生,主研方向为数据挖掘、自然语言处理;郝志峰、蔡瑞初,教授、博士;温雯,副教授、博士;梁礼欣,硕士研究生。
  • 基金资助:
    国家自然科学基金(U1501254,61472089,61572143);广东省自然科学基金(2014A030306004,2014A030308008);广东省科技计划项目(2015B010108006);广东省教育厅青年创新人才项目(2015KQNCX027)。

Method of Microblog Emotional Tendency Classification Based on AWCRF Model

CHEN Bingfeng 1,HAO Zhifeng 1,2,CAI Ruichu 1,WEN Wen 1,LIANG Lixin 1   

  1. (1.Faculty of Computer,Guangdong University of Technology,Guangzhou 510006,China; 2.School of Mathematics and Big Data,Foshan University,Foshan,Guangdong 528000,China)
  • Received:2016-10-24 Online:2017-07-15 Published:2017-07-15

摘要: 为有效解决中文微博情感数据分布不平衡的分类问题,提出一种融合Affinity Propogation(AP)算法、Word2vec技术和条件随机场(CRF)模型的分类方法。通过AP算法对微博数据进行聚类,将多数类样本按照相似性的度量划分为若干簇类,使类间距离极大化、类内距离极小化。利用欠采样技术构建情感倾向分布平衡的训练集,采用Word2vec计算并求出语义相似度最高的文本来扩展微博句子以增加情感信息,使用CRF模型计算已经平衡并扩展后的训练集标签序列,在数据集情感倾向分布不平衡时也能准确地分类微博情感倾向。实验结果表明,与ACRF方法、CRF方法及SCRF方法相比,该方法在召回率和G均值评价标准上具有更好的效果。

关键词: 情感分析, 情感分类, Affinity Propogation算法, 欠采样技术, Word2vec技术, 条件随机场

Abstract: To effectively solve the classification problem of imbalanced distribution of Chinese microblog sentiment data, this paper presents a classification method which combines affinity propagation algorithm, Word2vec technology and Conditional Random Field(CRF) model.It clusters the data of microblog by Affinity Propagation(AP) algorithm, and the majority class of samples are divided into several clusters according to the similarity measure for maximum distance between classes and minimum distance within class.The training set with balanced emotional tendency is constructed by using the undersampling technique.Word2vec is used to obtain the texts with the highest sentiment similarity to expand the microblog sentence and increase the sentiment information.The CRF model is used to calculate the label sequence of the balanced and extended training set.It is also possible to classify the emotional tendency of microblog accurately when the distribution of emotional tendency is unbalanced.Experimental results demonstrate that the proposed method achieves better results than ACRF method, CRF method and SCRF method in recall and G-mean evaluation criteria.

Key words: sentiment analysis, sentiment classification, Affinity Propogation(AP) algorithm, undersampling technique, Word2vec technique, Conditional Random Field(CRF)

中图分类号: