作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于加权TextRank的新闻关键事件主题句提取

蒲梅,周枫,周晶晶,严馨,周兰江   

  1. (昆明理工大学 信息工程与自动化学院,昆明 650500)
  • 收稿日期:2016-06-13 出版日期:2017-08-15 发布日期:2017-08-15
  • 作者简介:蒲梅(1991—),女,硕士研究生,主研方向为数据挖掘、自然语言处理;周枫,副教授、硕士;周晶晶,硕士研究生;严馨、周兰江,副教授、硕士。
  • 基金资助:

    国家自然科学基金“基于篇章特征的越南语新闻事件信息抽取关键技术研究”(61562049)。

Topic Sentence Extraction of Key News Events Based on Weighted TextRank

PU Mei,ZHOU Feng,ZHOU Jingjing,YAN Xin,ZHOU Lanjiang   

  1. (School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China)
  • Received:2016-06-13 Online:2017-08-15 Published:2017-08-15

摘要:

为了在大量的新闻中快速找到自己感兴趣的内容,提出在单文档中基于加权TextRank算法提取主题句的方法,以得到新闻关键事件信息。通过计算新闻文本句子关键词的互信息值,对新闻报道进行事件句和非事件句的分类,过滤出非事件句。基于TextRank算法的思想,构建一个事件句有向图,引入句子位置、句子相似度和关键词覆盖频率3个影响因子,以此计算句子之间的影响权重,利用TextRank模型对图中的每个点计算权重,并选取排序最靠前的句子作为关键事件的主题句。实验结果表明,该方法的抽取效果优于基于词频-逆文档概率和新闻标题的主题句抽取方法。

关键词: TextRank算法, 句子相似度, 关键事件, 主题句提取, 影响权重

Abstract:

In order to quickly find the content you are interested in in large number of news, a method based on weighted TextRank algorithm is proposed to extract the topic sentence in a single document and get information about key news events.It classifies news reports as event sentences and non-event sentences and filters the latter by calculating the mutual information value of the keywords in the news text sentences.It constructs a directed graph of event sentences on the basis of TextRank algorithm, and calculates the influence weight between sentences by introducing three influence factors of the sentence position, sentence similarity and keyword coverage frequency.It calculates the weight for each point in the graph by using TextRank model and selects the most front sorting sentences as topic sentences of the key events.Experimental results show that the proposed method is better than the methods based on Term Frequency-Inverse Document Probabilistic(TF-IDF) and news title in topic sentence extraction.

Key words: TextRank algorithm, sentence similarity, key event, topic sentence extraction, influence weight

中图分类号: