作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2018, Vol. 44 ›› Issue (6): 169-175. doi: 10.19678/j.issn.1000-3428.0051068

• 人工智能及识别技术 • 上一篇    下一篇

基于Topic Signature的动态文摘更新方法

张祯,樊兴悦,郭禹田,吴国华   

  1. 杭州电子科技大学 网络空间安全学院,杭州 310018
  • 收稿日期:2018-01-03 出版日期:2018-06-15 发布日期:2018-06-15
  • 作者简介:张祯(1978—),男,讲师、硕士,主研方向为文本挖掘、信息安全;樊兴悦,硕士研究生;郭禹田,硕士;吴国华,研究员。
  • 基金资助:
    国家部委基金。

Dynamic Summarization Update Method Based on Topic Signature

ZHANG Zhen,FAN Xingyue,GUO Yutian,WU Guohua   

  1. School of Cyberspace Security,Hangzhou Dianzi University,Hangzhou 310018,China
  • Received:2018-01-03 Online:2018-06-15 Published:2018-06-15

摘要: 目前针对动态文摘的研究关注对象主要是多文档集合,其中内容随时间而更新演化,但动态文摘中存在高冗余、新颖信息丢失等问题,会影响文摘提取质量。为此,研究Topic Signature模型,并在其基础上提出一种新的整数规划动态文摘更新方法。根据句间相似度对每条语句的主题代表性和信息多样性进行评分,利用Topic Signature模型评估语句的新颖性,以提取事件中的更新演进信息。在此基础上,依据摘要生成策略,缩小解的可行域,以保证在短时间内生成高质量的文摘。实验结果表明,该方法无需进行模型训练和语言匹配,能够有效降低时间复杂度,提高文摘提取效率。

关键词: 动态文摘, Topic Signature模型, 密度峰值, 整数规划模型, 自然语言处理

Abstract: The dynamic summarization is to construct evolutionary content of collection.But there are some complicated problems in dynamic summarization,such as information redundancy,novelty information easily lost.To solve the above problems,this paper proposes an Integer Linear Programming(ILP) dynamic summarization update method based on Topic Signature model.According to the similarities between sentences,it calculates the representativeness score and diversity score for each sentence and introduces the Topic Signature model to determine the novelty of the sentences.Based on the summary generation strategy,the feasible region of understanding can be reduced and the high quality abstracts can be generated in a short time.Experiment result shows that the proposed method can effectively reduce the time complexity and improve the efficiency without model training and language matching.

Key words: dynamic summarization, Topic Signature model, density peak, Integer Programming Model(ILP), Natural Language Processing(NLP)

中图分类号: