作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (3): 1-17. doi: 10.19678/j.issn.1000-3428.0064374

• 热点与综述 • 上一篇    下一篇

基于舆情新闻的中文关键词抽取综述

杨文忠1,2, 丁甜甜1, 康鹏1, 卜文秀1   

  1. 1. 新疆大学 信息科学与工程学院, 乌鲁木齐 830046;
    2. 新疆大学 信息科学与工程学院 新疆维吾尔自治区多语种信息技术重点实验室, 乌鲁木齐 830046
  • 收稿日期:2022-04-06 修回日期:2022-08-27 出版日期:2023-03-15 发布日期:2023-03-13
  • 作者简介:杨文忠(1971—),男,教授,主研方向为自然语言处理、网络舆情;丁甜甜(通信作者)、康鹏、卜文秀,硕士研究生。
  • 基金资助:
    国家自然科学基金(U1603115,62262065);国家重点研发计划子课题(2017YFC0820702-3);新疆维吾尔自治区重点科技专项(2020A02001-1);四川省区域创新合作项目(2020YFQ0018);新疆维吾尔自治区自然科学基金(2021D01C080)。

Review of Chinese Keyword Extraction Based on Public Opinion News

YANG Wenzhong1,2, DING Tiantian1, KANG Peng1, BU Wenxiu1   

  1. 1. College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China;
    2. Key Laboratory of Multilingual Information Technology in Xinjiang Uygur Autonomous Region, College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
  • Received:2022-04-06 Revised:2022-08-27 Online:2023-03-15 Published:2023-03-13

摘要: 基于舆情事件的关键词抽取算法作为舆情监测的基础技术之一,其目的是在不同的舆情事件中抽取出人们关注的核心词汇,从而快速了解新闻内容。随着深度学习的发展,传统的无监督关键词抽取技术和有监督算法中的分类模型已经逐渐被基于深度学习的序列标注模型所替代。梳理无监督关键词抽取的限制性、分类模型在关键词抽取中的优势与不足、以及现有的深度学习对关键词抽取技术发展的帮助,重点分析整体关键词抽取技术的发展中卷积神经网络、循环神经网络等深度学习的关键词抽取方法,并归纳现有方法的优缺点与发展趋势。此外,深度学习虽然在关键词抽取领域发挥了重要的作用,但其自身也存在着依赖大规模带标签样本、训练时间长与复杂度高等缺陷,需要在未来发展中进行解决。为确保分析过程的真实性,利用6个舆情新闻数据集和2个小型数据集进行实验复现,实验结果与文中理论分析一致。在此基础上,对关键词抽取技术及其所面临的困难和挑战进行梳理和分析,并针对现存问题对该领域的发展前景加以展望。

关键词: 舆情监测, 关键词抽取, 核心词汇, 深度学习, 自然语言处理

Abstract: The keyword extraction algorithm for public opinion events is used as a basic technique for public opinion monitoring.To quickly understand the news content, the algorithm aims to extract the core words associated with the concerns of the people at different events.With the development of deep learning, the traditional unsupervised keyword extraction techniques and classification models in supervised algorithms have been gradually replaced by sequence annotation models.The limitations of unsupervised keyword extraction, the advantages and disadvantages associated with classification models for keyword extraction, and the application of existing deep learning to assist in the development of keyword extraction technology have been addressed. The development of the overall keyword extraction technology is focused on analyzing the development of the deep learning keyword extraction methods, such as convolutional neural networks and recurrent neural networks.Furthermore, the advantages, disadvantages, and development trends of existing methods are summarized. In addition, although deep learning has an important function in the field of keyword extraction, the associated disadvantages of reliance on large-scale labeled samples, long training time, and high complexity need to be addressed further in future development. To ensure the authenticity of the analysis process, experimental replications were conducted using six public opinion news datasets and two small datasets.The experimental results were consistent with the theoretical analysis presented.On this basis, the various keyword extraction techniques and their associated difficulties and challenges are reviewed and analyzed. Additionally, the prospects for the development of this field are discussed in view of the existing problems.

Key words: opinion monitoring, keyword extraction, core word, deep learning, Natural Language Processing(NLP)

中图分类号: