作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (08): 51-53. doi: 10.3969/j.issn.1000-3428.2007.08.017

• 博士论文 • 上一篇    下一篇

一种基于语言概念空间聚类的信息检索方法

吴 晨1,2,张 全2   

  1. (1. 中国科学院研究生院,北京 100039;2. 中国科学院声学研究所,北京 100080)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-04-20 发布日期:2007-04-20

An Information Retrieval Method Based on Language Concept Space Using Clustering Method

WU Chen1,2, ZHANG Quan2   

  1. (1. Graduate School, Chinese Academy of Sciences, Beijing 100039; 2. Institute of Acoustics, Chinese Academy of Sciences, Beijing 100080)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-04-20 Published:2007-04-20

摘要: 提出了一种以语言概念空间中的概念为聚类对象的信息检索方法以及适合于该方法的聚类算法。该聚类算法通过曲线拟合技术来实现文本的自动阈值确定和聚类划分,并最终通过聚类间的迭代和结果修正来完成整个聚类过程。概念的引入为解决词语的同义、多义问题提供了有力保障。实验表明,采用该方法的信息检索系统,与Jelinek-Mercer、k-means模型相比有较高的准确率和召回率,效果理想。

关键词: 信息检索, 语言概念空间, 聚类, 自动阈值下的聚类划分

Abstract: An information retrieval model based on language concept space and a clustering method which serves the IR model is propsed. The clustering method uses curve-fitting to implement the text clustering by auto threshold-detection means, and complete the whole clustering process through result revising phase. The use of word concept can reduce the word sense ambiguity as drastically as possible when processing the text. The experiments indicate that the method presented in this paper has good performance. Compared with Jelinek-Mercer smoothing model and k-means model, the precision and the recall of the system are higher to a certain degree.

Key words: Information retrieval, Language concept space, Clustering, Auto threshold-detection and classification