一种通过评价类别分类提升评价对象抽取性能的方法

doi:10.19678/j.issn.1000-3428.0062992

计算机工程 ›› 2022, Vol. 48 ›› Issue (11): 96-103,136. doi: 10.19678/j.issn.1000-3428.0062992

一种通过评价类别分类提升评价对象抽取性能的方法

崔伟琪^1,2, 严馨^1,2, 滕磊³, 陈玮^1,2, 徐广义⁴

1. 昆明理工大学信息工程与自动化学院, 昆明 650504;
2. 昆明理工大学云南省人工智能重点实验室, 昆明 650504;
3. 湖南快乐阳光互动娱乐传媒有限公司, 长沙 410000;
4. 云南南天电子信息产业股份有限公司, 昆明 650040

收稿日期:2021-10-19 修回日期:2021-12-24 发布日期:2021-12-31
作者简介:崔伟琪(1996—),男,硕士研究生,主研方向为自然语言处理;严馨,副教授、硕士;滕磊,硕士;陈玮,讲师、博士研究生;徐广义,高级工程师、硕士。
基金资助:
国家自然科学基金（61562049，61462055）。

A Method for Improving Performance of Opinion Targets Extraction by Evaluating Category Classification

CUI Weiqi^1,2, YAN Xin^1,2, TENG Lei³, CHEN Wei^1,2, XU Guangyi⁴

1. School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504, China;
2. Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650504, China;
3. Hunantv. com Interactive Entertainment Media Co., Ltd., Changsha 410000, China;
4. Yunnan Nantian Electronics Information Co., Ltd., Kunming 650040, China

Received:2021-10-19 Revised:2021-12-24 Published:2021-12-31

摘要/Abstract

摘要： 评价对象抽取主要用于文本的意见挖掘，旨在发掘评论文本中的评价对象实体。基于无监督的自编码器方法可以识别评论语料库中潜藏的主题信息，且无需人工标注语料，但自编码器抽取的评价对象缺乏多样性。提出一种基于监督学习的句子级分类任务和无监督学习自编码器混合模型。该模型通过训练一个分类器生成评价对象类别，对自编码器共享分类任务中的LSTM-Attention结构进行编码得到句向量表征，以增加语义关联度，根据得到的评价对象类别将句向量表征转化为中间层语义向量，从而捕捉到评价对象类别与评价对象之间的相关性，提高编码器的编码能力，最终通过对句向量的重构进行解码得到评价对象矩阵，并依据计算评价对象矩阵与句中单词的余弦相似度完成评价对象的抽取。在多领域评论语料库上的实验结果表明，与k-means、LocLDA等方法相比，该方法评价指标在餐厅领域中提升了3.7%，在酒店领域中提升了2.1%，可有效解决训练过程缺少评价类别多样性的问题，具有较好的评价对象抽取能力。

关键词: 自编码器, 注意力机制, 句子分类, 长短期记忆模型, 评价对象抽取

Abstract: Opinion targets extraction is mainly used for text opinion mining to discover evaluation object entities in review texts.The algorithm based on an unsupervised autoencoder can identify hidden topic information in the review corpus without manual annotation, but the evaluation objects extracted by the autoencoder lack diversity.This paper proposes a hybrid model of sentence-level classification tasks using supervised learning and autoencoder based on unsupervised learning.The model trains a classifier to generate aspect categories.The Long Short-Term Memory(LSTM)-Attention structure in the shared classification task of the encoder is encoded to obtain the sentence vector representation to increase the semantic relevance.The obtained aspect category then transforms the sentence vector representation into the middle layer semantic vector to capture the correlation between the aspect category and aspect extraction and to improve the coding ability of the encoder.The model decodes the reconstruction of the sentence vector and trains it to obtain the aspect matrix.Finally, the aspect is extracted by calculating the cosine similarity between the aspect matrix and the words in the sentence.The experimental results for the multidomain review corpus show that compared with k-means and Localized Linear Discriminant Analysis(LocLDA), the evaluation index of this method improves by 3.7% in the restaurant field and 2.1% in the hotel field.This approach somewhat solves the problem of lack of evaluation category diversity in the training process and exhibits improved extraction ability of evaluation objects.

Key words: autoencoder, attention mechanism, sentence classification, Long Short-Term Memory(LSTM) model, opinion targets extraction

中图分类号:

TP18

崔伟琪, 严馨, 滕磊, 陈玮, 徐广义. 一种通过评价类别分类提升评价对象抽取性能的方法[J]. 计算机工程, 2022, 48(11): 96-103,136.

CUI Weiqi, YAN Xin, TENG Lei, CHEN Wei, XU Guangyi. A Method for Improving Performance of Opinion Targets Extraction by Evaluating Category Classification[J]. Computer Engineering, 2022, 48(11): 96-103,136.

http://www.ecice06.com/CN/Y2022/V48/I11/96

图/表 6

20230203202827

20230203202830

20230203202834

20230203202837

20230203202841

20230203202845

参考文献

[1] BING L.Sentiment analysis and opinion mining[J].Synthesis Lectures on Human Language Technologies, 2012, 5(1):1-16.
[2] HU M Q, LIU B.Mining and summarizing customer reviews[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York, USA:ACM Press, 2004:168-177.
[3] POPESCU A M, ETZIONI O.Extracting product features and opinions from reviews[C]//Proceedings of International Conference on Natural Language Processing and Text Mining.Berlin, Germany:Springer, 2007:9-28.
[4] ZHUANG L, JING F, ZHU X Y.Movie review mining and summarization[C]//Proceedings of the 15th ACM International Conference on Information and Knowledge Management.New York, USA:ACM Press, 2006:43-50.
[5] PORIA S, CAMBRIA E, KU L W, et al.A rule-based approach to aspect extraction from product reviews[C]//Proceedings of the 2nd Workshop on Natural Language Processing for Social Media.Stroudsburg, USA:Association for Computational Linguistics, 2014:28-37.
[6] RANA T A, CHEAH Y N.A two-fold rule-based model for aspect extraction[J].Expert Systems with Applications, 2017, 89:273-285.
[7] YIN Y C, WEI F R, DONG L, et al.Unsupervised word and dependency path embeddings for aspect term extraction[EB/OL].[2021-09-10].https://arxiv.org/abs/1605. 07843.
[8] PORIA S, CAMBRIA E, GELBUKH A.Aspect extraction for opinion mining with a deep convolutional neural network[J].Knowledge-Based Systems, 2016, 108:42-49.
[9] XU H, LIU B, SHU L, et al.Double embeddings and CNN-based sequence labeling for aspect extraction[EB/OL].[2021-09-10].https://arxiv.org/abs/1805.04601.
[10] ZHANG Z S, RAO Y H, LAI H J, et al.TADC:a topic-aware dynamic convolutional neural network for aspect extraction[J].IEEE Transactions on Neural Networks and Learning Systems, 2021, 59(1):1-13.
[11] LIANG T, WANG W Y, LÜ F M.Weakly supervised domain adaptation for aspect extraction via multilevel interaction transfer[J].IEEE Transactions on Neural Networks and Learning Systems, 1474, 55(1):1-12.
[12] BLEI D M, NG A Y, JORDAN M I.Latent dirichlet allocation[J].Journal of Machine Learning Research, 2003, 3(1):993-1022.
[13] CHEN Z Y, MUKHERJEE A, LIU B.Aspect extraction with automated prior knowledge learning[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.Stroudsburg, USA:Association for Computational Linguistics, 2014:347-358.
[14] DIENG A B, RUIZ F J R, BLEI D M.Topic modeling in embedding spaces[J].Transactions of the Association for Computational Linguistics, 2020, 8:439-453.
[15] ZHU L X, HE Y L, ZHOU D Y.A neural generative model for joint learning topics and topic-specific word embeddings[J].Transactions of the Association for Computational Linguistics, 2020, 8:471-485.
[16] 叶康.基于主题模型和注意力机制的短文本方面提取研究[D].南京:南京大学, 2019. YE K.Research on short-text aspect extraction based on topic model and attention mechanism[D].Nanjing:Nanjing University, 2019.(in Chinese)
[17] HE R D, LEE W S, NG H T, et al.An unsupervised neural attention model for aspect extraction[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.Stroudsburg, USA:Association for Computational Linguistics, 2017:388-397.
[18] MIKOLOV T, YIH W, ZWEIG G.Linguistic regularities in continuous space word representations[C]//Proceedings of 2013 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Washington D.C., USA:IEEE Press, 2013:746-751.
[19] DONG L, WEI F R, XU K, et al.Adaptive multi-compositionality for recursive neural network models[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(3):422-431.
[20] IYYER M, GUHA A, CHATURVEDI S, et al.Feuding families and former friends:unsupervised learning for dynamic fictional relationships[C]//Proceedings of 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Stroudsburg, USA:Association for Computational Linguistics, 2016:1534-1544.
[21] SOCHER R, KARPATHY A, LE Q V, et al.Grounded compositional semantics for finding and describing images with sentences[J].Transactions of the Association for Computational Linguistics, 2014, 2:207-218.
[22] WESTON J, BENGIO S, USUNIER N.WSABIE:scaling up to large vocabulary image annotation[C]//Proceedings of the 22nd International Joint Conference on Artificial Intelligence.Washington D.C., USA:IEEE Press, 2011:2764-2770.
[23] BRODY S, ELHADAD N.An unsupervised aspect-sentiment model for online reviews[C]//Proceedings of 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics.Washington D.C., USA:IEEE Press, 2010:804-812.
[24] YAN X H, GUO J F, LAN Y Y, et al.A biterm topic model for short texts[C]//Proceedings of the 22nd International Conference on World Wide Web.Washington D.C., USA:IEEE Press, 2013:1445-1456.
[25] VARGAS D S, PESSUTTO L R C, MOREIRA V P.Simple unsupervised similarity-based aspect extraction[EB/OL].[2021-09-10].https://arxiv.org/abs/2008.10820.
[26] PENNINGTON J, SOCHER R, MANNING C.GLOVE:global vectors for word representation[C]//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing.Stroudsburg, USA:Association for Computational Linguistics, 2014:1532-1543.

选择文件类型/文献管理软件名称

选择包含的内容

一种通过评价类别分类提升评价对象抽取性能的方法

A Method for Improving Performance of Opinion Targets Extraction by Evaluating Category Classification

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 6

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	丰芳宇, 罗晓曙, 蒙志明, 王广宇. 基于抗混叠残差注意力网络的人脸表情识别[J]. 计算机工程, 2023, 49(8): 190-198.
[2]	王书朋, 何引弟. 融合特征注意力机制的非均匀光照图像增强算法[J]. 计算机工程, 2023, 49(8): 232-239.
[3]	刘昊鑫, 董超, 勾智楠, 高凯. 融合混合表征的小样本关系抽取方法[J]. 计算机工程, 2023, 49(8): 63-68.
[4]	杨长沛, 廖列法. 基于门控空洞卷积特征融合的中文命名实体识别[J]. 计算机工程, 2023, 49(8): 85-95.
[5]	刘俊豪, 王美林, 谢兴, 宋烨兴, 许莉花. 基于改进YOLOv5的皮革瑕疵检测算法[J]. 计算机工程, 2023, 49(8): 240-249.
[6]	马娜, 温廷新, 贾旭, 李晓会. 复杂光照条件下自适应的车脸重识别模型[J]. 计算机工程, 2023, 49(8): 275-282, 290.
[7]	陈露萌, 曹彦彦, 黄民, 谢鑫钢. 基于改进YOLOv5的火焰检测方法[J]. 计算机工程, 2023, 49(8): 291-301, 309.
[8]	李强龙, 周新文, 位梦恩, 甘阳洲. 基于条形池化和注意力机制的街道场景红外目标检测算法[J]. 计算机工程, 2023, 49(8): 310-320.
[9]	张家熔, 苑津莎, 许珈宁, 罗志宏. 基于多元信息嵌入与协同神经网络的力学实体识别算法[J]. 计算机工程, 2023, 49(7): 125-134.
[10]	白明昌. 基于折叠路径聚合的属性网络节点嵌入方法[J]. 计算机工程, 2023, 49(7): 76-84.
[11]	费蓉, 马梦阳, 张晓, 黑新宏, 徐庆征, 邱原. 基于轨迹预测与冲突检测的自动驾驶碰撞检测模型[J]. 计算机工程, 2023, 49(7): 10-20.
[12]	刘豪, 吴红兰, 房宇轩. 结合全局上下文信息的高效人体姿态估计[J]. 计算机工程, 2023, 49(7): 102-109.
[13]	吴珊, 周凤. 基于改进SSD算法的小目标检测[J]. 计算机工程, 2023, 49(7): 179-188.
[14]	齐咏生, 杜晓旭, 朱俊峰, 高胜利, 刘利强. 基于增强型轻量深度网络的牧区牲畜高效检测[J]. 计算机工程, 2023, 49(7): 278-287.
[15]	陈明, 刘蓉, 张晔. 基于多重注意力机制的中文医疗实体识别[J]. 计算机工程, 2023, 49(6): 314-320.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

一种通过评价类别分类提升评价对象抽取性能的方法

A Method for Improving Performance of Opinion Targets Extraction by Evaluating Category Classification

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 6

参考文献

相关文章 15

编辑推荐

Metrics

本文评价