作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (7): 19-22. doi: 10.3969/j.issn.1000-3428.2008.07.007

• 博士论文 • 上一篇    下一篇

基于文档标引图模型的文本相似度策略

高茂庭1,王正欧2   

  1. (1. 上海海事大学计算机科学与工程系,上海 200135;2. 天津大学系统工程研究所,天津 300072)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-04-05 发布日期:2008-04-05

Document Similarity Strategy Based on Document Index Graph Model

GAO Mao-ting1, WANG Zheng-ou2   

  1. (1. Computer Science and Engineering Department, Shanghai Maritime University, Shanghai 200135; 2. Institute of Systems Engineering, Tianjin University, Tianjin 300072)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-04-05 Published:2008-04-05

摘要: 文档标引图是一种基于短语的图结构文本特征表示模型,能更加全面、准确地表达文本特征信息,实现渐增的文本聚类和信息处理。该文基于文档标引图特征模型,提出文档相似度计算加法策略和乘法策略,采用变换函数对文档相似度值进行调整,增强文档之间的可区分性,改进文本聚类和分类等处理的性能,实例证明了策略的有效性。

关键词: 文本聚类, 文档标引图, 文本相似度, 文本特征模型

Abstract: Document Index Graph(DIG) is a kind of phrase-based graph structure text feature representation model, which is able to express text feature information more completely and exactly to realize incremental text clustering and information processing. Based on DIG, document similarity additive and multiplicative strategy are proposed, document similarity is adjusted by a set of transform function, distinguishability between documents is strengthened, and performance of text clustering and classification are improved. Experiments demonstrate the efficiency of the methods.

Key words: text clustering, Document Index Graph(DIG), document similarity, text feature model

中图分类号: