作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (19): 64-66. doi: 10.3969/j.issn.1000-3428.2008.19.023

• 软件技术与数据库 • 上一篇    下一篇

基于潜在语义差异的医学网页聚类

米晓芳,秦 洋,王立宏,宋宜斌   

  1. (烟台大学计算机学院,烟台 264005)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-10-05 发布日期:2008-10-05

Medical Webpage Clustering Based on Latent Semantic Difference

MI Xiao-fang, QIN Yang, WANG Li-hong, SONG Yi-bin   

  1. (College of Computer, Yantai University, Yantai 264005)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-10-05 Published:2008-10-05

摘要: 采用潜在语义索引的全局模型和局部模型表示医学网页时,模糊聚类结果的类间包含度很大。该文提出一种新的潜在语义差异模型,将医学网页中的文本抽取出来并分别采用全局模型、局部模型和差异模型进行表示,利用FCM算法进行聚类并计算类间包含度。实验发现,对给定的5类医学网页进行聚类时,采用差异模型时的类间包含度平均约为全局模型的85%、局部模型的80%。

关键词: 潜在语义索引, 差异模型, 文本挖掘, FCM聚类, 包含度

Abstract: Fuzzy clustering, two categories of medical Web pages represented by global LSI or local LSI generate two fuzzy sets with a large inclusion degree. A new latent semantic difference model is proposed. The text in medical Webpage is extracted and represented by global LSI, local LSI and difference LSI respectively. FCM algorithm is employed to cluster the feature vectors and inclusion degree between two result fuzzy sets is calculated. Experiments on five given categories of medical Webpages show that, on the average, difference LSI reduces the inclusion degree by a factor of 85% and 80% respectively when compared with global LSI and local LSI.

Key words: latent semantic index, difference model, text mining, FCM clustering, inclusion degree

中图分类号: