作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2010, Vol. 36 ›› Issue (4): 190-192. doi: 10.3969/j.issn.1000-3428.2010.04.066

• 人工智能及识别技术 • 上一篇    下一篇

基于主题概念抽取的多文档文摘方法

宋宣辰,刘贵全   

  1. (中国科学技术大学计算机科学与技术系,合肥 230027)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2010-02-20 发布日期:2010-02-20

Multi-document Summarization Method Based on Topic-concepts Extract

SONG Xuan-chen, LIU Gui-quan   

  1. (Department of Computer Science and Technology, University of Science and Technology of China, Hefei 230027)
  • Received:1900-01-01 Revised:1900-01-01 Online:2010-02-20 Published:2010-02-20

摘要: 提出一种应用于多文档文摘的有效概念抽取方法。利用WordNet中词语的同义和上下义关系进行语义消歧和概念树构造,通过概念优化算法进行主题概念抽取,建立概念向量空间模型并通过最大边缘相关方法得到文摘句。采用语义概念统计来替代传统的词形统计,能更准确地提取文档中的重要信息。DUC2005的评测结果表明,该方法比传统方法能获得更好的效果。

关键词: 多文档文摘, 概念树, 概念抽取

Abstract: In this paper, an effective topic-concepts extract method is proposed and applied in multi-document summarization. The synonymy and hyponymy in WordNet are used to process word semantic disambiguate and to merge concept-trees. The topic-concepts are extracted through the concept optimization method afterward. Using the topic-concepts, the Vector Space Model(VSM) is constructed and the summary is produced through Maximal Marginal Relevance(MMR) method. The special aspect of this method is that the word counting in traditional method is replaced by concept counting, and can get important information more exactly from the corpus. Experimental result on DUC2005 evaluation indicates that the method can produce better summary compared with traditional method.

Key words: multi-document summarization, concept-trees, concept extract

中图分类号: