Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2009, Vol. 35 ›› Issue (17): 4-6. doi: 10.3969/j.issn.1000-3428.2009.17.002

• Degree Paper • Previous Articles     Next Articles

Improvement of Text Similarity Computing Algorithm Based on Attribute

YUAN Zheng-wu1, LI Yu-sen1, ZHANG Xue-ying2   

  1. (1. Sino-Korea Chongqing GIS Research Center, Chongqing University of Posts and Telecommunications, Chongqing 400065; 2. Key Laboratory of Virtual Geographical Environment Ministry of Education, Nanjing Normal University, Nanjing 210046)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-09-05 Published:2009-09-05

基于属性的文本相似度计算算法改进

袁正午1,李玉森1,张雪英2   

  1. (1. 重庆邮电大学中韩合作GIS研究所,重庆 400065;2. 南京师范大学虚拟地理环境教育部重点实验室,南京 210046)

Abstract: Documents similarity computing with attribute barycenter coordinate model is a relatively new method, but the semantic information easily loss and is inefficient. For resolving these problems, an improved algorithm based on the attribute barycenter coordinate is presented. The method is inspired from the satisfying degree function in decision-making assessment theory. Matching the points between the intersection of query line and document complex and document barycenter using the new algorithm can keep the character of document vector within the result and improve the precision as well as efficiency. Experimental results show that the recall, precision and value of F1 of the model can increase 2%~4%.

Key words: similarity computing, attribute coordinate, attribute barycenter point

摘要: 基于属性的重心剖分模型是一种较为新颖的文档相似度计算模型,但容易导致语义信息丢失和效率低下。针对上述问题,提出一种改进的重心剖分模型,通过计算查询线与文档单纯形的交点与文档重心点之间的相似度,使得结果保留属性坐标系中文档向量的特征。实验结果表明,该模型的查全率、查准率和F1值可以提高2%~4%左右。

关键词: 相似度计算, 属性坐标系, 属性重心点

CLC Number: