计算机工程 ›› 2020, Vol. 46 ›› Issue (7): 58-64,71.doi: 10.19678/j.issn.1000-3428.0054780

• 人工智能与模式识别 • 上一篇    下一篇

基于布谷鸟搜索优化算法的多文档摘要方法

周诗源, 王英林   

  1. 上海财经大学 信息管理与工程学院, 上海 200433
  • 收稿日期:2019-04-30 修回日期:2019-07-18 发布日期:2019-08-06
  • 作者简介:周诗源(1982-),男,博士研究生,主研方向为语义分析、文档摘要、机器学习;王英林,教授、博士生导师。
  • 基金项目:
    国家自然科学基金(61375053)。

Multiple Document Summarization Method Based on Optimized Cuckoo Search Algorithm

ZHOU Shiyuan, WANG Yinglin   

  1. School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China
  • Received:2019-04-30 Revised:2019-07-18 Published:2019-08-06

摘要: 为最大化生成摘要的信息量,提出一种基于布谷鸟搜索(CS)算法与多目标函数的多文档摘要方法。对多文档数据进行预处理,通过句子分割、分词、移除停用词和词干化将文档转化为词语的基本处理形式,计算经数据预处理后的句子信息量得分并将其作为CS算法的输入,再基于多目标函数生成包含原始文档重要信息的句子以组成最终的摘要。实验结果表明,与基于粒子群优化算法和双层K最近邻算法的多文档摘要方法相比,该方法在最大化生成摘要信息量的前提下,保证了高可读性和低冗余性,并且在DUC基准数据集上的摘要平均准确度高达0.99。

关键词: 多文档摘要, 布谷鸟搜索算法, 数据预处理, 多目标函数, 信息量

Abstract: To maximize the amount of information of generated summary,this paper proposes a multiple document summarization method based on the Cuckoo Search(CS) algorithm and multiple objective function.The method pre-processes data of multiple documents by using sentence segmentation,word segmentation,removal of stop words and word drying to transform the documents into a basic processed form of words.Then the score of information amount of pre-processed sentences is calculated to serve as the input of the CS algorithm.Based on the multiple objective function,the sentences including key information of original texts are generated to form the ultimate summarization.Results show that compared with multiple document summarization methods based on Particle Swarm Optimization(PSO) algorithm and Double-layer K Nearest Neighbor(DKNN) algorithm,the proposed summarization method maximizes the amount of information in the generated summary while keeping high readability and low redundancy.Its average accuracy rate on the DUC benchmark dataset reaches 0.99.

Key words: multiple document summarization, Cuckoo Search(CS) algorithm, data preprocessing, multiple objective function, amount of information

中图分类号: