作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2020, Vol. 46 ›› Issue (9): 89-94. doi: 10.19678/j.issn.1000-3428.0055368

• 人工智能与模式识别 • 上一篇    下一篇

融合BERT语义加权与网络图的关键词抽取方法

李俊, 吕学强   

  1. 北京信息科技大学 网络文化与数字传播北京市重点实验室, 北京 100101
  • 收稿日期:2019-07-03 修回日期:2019-09-18 发布日期:2019-09-29
  • 作者简介:李俊(1994-),男,硕士研究生,主研方向为自然语言处理;吕学强,教授、博士。
  • 基金资助:
    国家自然科学基金(61671070);国家语委重点科研项目(ZDI135-53)。

Keyword Extraction Method Based on BERT Semantic Weighting and Network Graph

LI Jun, LÜ Xueqiang   

  1. Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China
  • Received:2019-07-03 Revised:2019-09-18 Published:2019-09-29

摘要: 结合文档本身的结构信息与外部词语的语义信息,提出一种融合BERT词向量与TextRank的关键词抽取方法。在基于网络图的TextRank方法基础上,引入语义差异性并利用BERT词向量加权方式优化TextRank转移概率矩阵计算过程,同时通过迭代运算对文档中的词语进行综合影响力得分排序,最终提取得分最高的TopN个词语作为关键词。实验结果表明,当选取Top3、Top5、Top7和Top10个关键词时,与基于词向量聚类质心与TextRank加权的关键词抽取方法相比,该方法的平均F值提升了2.5%,关键词抽取效率更高。

关键词: 关键词抽取, 语义关系, 词向量, TextRank方法, 基于Transformer的双向编码器表示

Abstract: Based on the structural information of the document and the semantic information of external words,this paper proposes a keyword extraction method based on Bidirectional Encoder Representation from Transformer(BERT) word vectors and TextRank.Using network graph-based TextRank,this method introduces the semantic difference and uses BERT word vector weighting to optimize the calculation process of the transfer possibility matrix of TextRank.At the same time,the overall influence scores of words in the document are sorted by iteration,and the words with the TopN scores are selected as keywords.Experimental results show that when keywords are selected Top3,Top5,Top7 and Top10 words,the average F value of the proposed method is 2.5% higher than that of the keyword extraction method based on word vector clustering centroid and TextRank weighting.The proposed method can improve the efficiency of keyword extraction.

Key words: keyword extraction, semantic relation, word vector, TextRank method, Bidirectional Encoder Representation from Transformer(BERT)

中图分类号: