作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (11): 96-103,136. doi: 10.19678/j.issn.1000-3428.0062992

• 人工智能与模式识别 • 上一篇    下一篇

一种通过评价类别分类提升评价对象抽取性能的方法

崔伟琪1,2, 严馨1,2, 滕磊3, 陈玮1,2, 徐广义4   

  1. 1. 昆明理工大学 信息工程与自动化学院, 昆明 650504;
    2. 昆明理工大学 云南省人工智能重点实验室, 昆明 650504;
    3. 湖南快乐阳光互动娱乐传媒有限公司, 长沙 410000;
    4. 云南南天电子信息产业股份有限公司, 昆明 650040
  • 收稿日期:2021-10-19 修回日期:2021-12-24 发布日期:2021-12-31
  • 作者简介:崔伟琪(1996—),男,硕士研究生,主研方向为自然语言处理;严馨,副教授、硕士;滕磊,硕士;陈玮,讲师、博士研究生;徐广义,高级工程师、硕士。
  • 基金资助:
    国家自然科学基金(61562049,61462055)。

A Method for Improving Performance of Opinion Targets Extraction by Evaluating Category Classification

CUI Weiqi1,2, YAN Xin1,2, TENG Lei3, CHEN Wei1,2, XU Guangyi4   

  1. 1. School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504, China;
    2. Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650504, China;
    3. Hunantv. com Interactive Entertainment Media Co., Ltd., Changsha 410000, China;
    4. Yunnan Nantian Electronics Information Co., Ltd., Kunming 650040, China
  • Received:2021-10-19 Revised:2021-12-24 Published:2021-12-31

摘要: 评价对象抽取主要用于文本的意见挖掘,旨在发掘评论文本中的评价对象实体。基于无监督的自编码器方法可以识别评论语料库中潜藏的主题信息,且无需人工标注语料,但自编码器抽取的评价对象缺乏多样性。提出一种基于监督学习的句子级分类任务和无监督学习自编码器混合模型。该模型通过训练一个分类器生成评价对象类别,对自编码器共享分类任务中的LSTM-Attention结构进行编码得到句向量表征,以增加语义关联度,根据得到的评价对象类别将句向量表征转化为中间层语义向量,从而捕捉到评价对象类别与评价对象之间的相关性,提高编码器的编码能力,最终通过对句向量的重构进行解码得到评价对象矩阵,并依据计算评价对象矩阵与句中单词的余弦相似度完成评价对象的抽取。在多领域评论语料库上的实验结果表明,与k-means、LocLDA等方法相比,该方法评价指标在餐厅领域中提升了3.7%,在酒店领域中提升了2.1%,可有效解决训练过程缺少评价类别多样性的问题,具有较好的评价对象抽取能力。

关键词: 自编码器, 注意力机制, 句子分类, 长短期记忆模型, 评价对象抽取

Abstract: Opinion targets extraction is mainly used for text opinion mining to discover evaluation object entities in review texts.The algorithm based on an unsupervised autoencoder can identify hidden topic information in the review corpus without manual annotation, but the evaluation objects extracted by the autoencoder lack diversity.This paper proposes a hybrid model of sentence-level classification tasks using supervised learning and autoencoder based on unsupervised learning.The model trains a classifier to generate aspect categories.The Long Short-Term Memory(LSTM)-Attention structure in the shared classification task of the encoder is encoded to obtain the sentence vector representation to increase the semantic relevance.The obtained aspect category then transforms the sentence vector representation into the middle layer semantic vector to capture the correlation between the aspect category and aspect extraction and to improve the coding ability of the encoder.The model decodes the reconstruction of the sentence vector and trains it to obtain the aspect matrix.Finally, the aspect is extracted by calculating the cosine similarity between the aspect matrix and the words in the sentence.The experimental results for the multidomain review corpus show that compared with k-means and Localized Linear Discriminant Analysis(LocLDA), the evaluation index of this method improves by 3.7% in the restaurant field and 2.1% in the hotel field.This approach somewhat solves the problem of lack of evaluation category diversity in the training process and exhibits improved extraction ability of evaluation objects.

Key words: autoencoder, attention mechanism, sentence classification, Long Short-Term Memory(LSTM) model, opinion targets extraction

中图分类号: