基于属性的文本相似度计算算法改进

doi:10.3969/j.issn.1000-3428.2009.17.002

计算机工程 ›› 2009, Vol. 35 ›› Issue (17): 4-6. doi: 10.3969/j.issn.1000-3428.2009.17.002

基于属性的文本相似度计算算法改进

袁正午1，李玉森1，张雪英2

(1. 重庆邮电大学中韩合作GIS研究所，重庆 400065；2. 南京师范大学虚拟地理环境教育部重点实验室，南京 210046)

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-09-05 发布日期:2009-09-05

Improvement of Text Similarity Computing Algorithm Based on Attribute

YUAN Zheng-wu1, LI Yu-sen1, ZHANG Xue-ying2

(1. Sino-Korea Chongqing GIS Research Center, Chongqing University of Posts and Telecommunications, Chongqing 400065; 2. Key Laboratory of Virtual Geographical Environment Ministry of Education, Nanjing Normal University, Nanjing 210046)

Received:1900-01-01 Revised:1900-01-01 Online:2009-09-05 Published:2009-09-05

摘要/Abstract

摘要： 基于属性的重心剖分模型是一种较为新颖的文档相似度计算模型，但容易导致语义信息丢失和效率低下。针对上述问题，提出一种改进的重心剖分模型，通过计算查询线与文档单纯形的交点与文档重心点之间的相似度，使得结果保留属性坐标系中文档向量的特征。实验结果表明，该模型的查全率、查准率和F1值可以提高2%~4%左右。

关键词: 相似度计算, 属性坐标系, 属性重心点

Abstract: Documents similarity computing with attribute barycenter coordinate model is a relatively new method, but the semantic information easily loss and is inefficient. For resolving these problems, an improved algorithm based on the attribute barycenter coordinate is presented. The method is inspired from the satisfying degree function in decision-making assessment theory. Matching the points between the intersection of query line and document complex and document barycenter using the new algorithm can keep the character of document vector within the result and improve the precision as well as efficiency. Experimental results show that the recall, precision and value of F1 of the model can increase 2%~4%.

中图分类号:

TP391

袁正午;李玉森;张雪英. 基于属性的文本相似度计算算法改进[J]. 计算机工程, 2009, 35(17): 4-6.

YUAN Zheng-wu; LI Yu-sen; ZHANG Xue-ying. Improvement of Text Similarity Computing Algorithm Based on Attribute[J]. Computer Engineering, 2009, 35(17): 4-6.

https://www.ecice06.com/CN/Y2009/V35/I17/4

[1]	邓远飞, 李加伟, 蒋运承. 基于知识注入提示学习的专利短语相似度计算[J]. 计算机工程, 2024, 50(4): 294-302.
[2]	郭渝洛, 边浩东, 董润婷, 唐嘉豪, 王晓英, 黄建强. 基于SIMD的并行傅里叶空间图像相似度计算[J]. 计算机工程, 2021, 47(11): 247-253.
[3]	杨海清, 范琦. 基于时空分析的路口相似度计算方法[J]. 计算机工程, 2020, 46(4): 33-39.
[4]	李志欣,兰丹媚,张灿龙,唐素勤. 基于Co-Training的微博垃圾评论识别方法[J]. 计算机工程, 2018, 44(7): 212-218.
[5]	赵英,韩春昊. 马尔科夫模型在网络流量分类中的应用与研究[J]. 计算机工程, 2018, 44(5): 291-295.
[6]	刘业政,熊强,姜元春. 基于多维相似度的利基产品推荐方法[J]. 计算机工程, 2018, 44(3): 195-200.
[7]	陈彦桦,李剑. 一种基于结构特征的树相似度计算方法[J]. 计算机工程, 2018, 44(11): 197-201,208.
[8]	贾伟洋,李书琴,李昕宇,刘斌. 基于离散量和用户兴趣贴近度的协同过滤推荐算法[J]. 计算机工程, 2018, 44(1): 226-232,237.
[9]	段旭磊,张仰森,孙祎卓. 微博文本的句向量表示及相似度计算方法研究[J]. 计算机工程, 2017, 43(5): 143-148.
[10]	程曦,陈军. 基于MapReduce与项目分类的协同过滤算法[J]. 计算机工程, 2016, 42(7): 194-198.
[11]	余峰，余正涛，杨剑锋，郭剑毅，严馨. 基于主题信息的项目评审专家推荐方法[J]. 计算机工程, 2014, 40(6): 201-205.
[12]	程小林，熊焰，刘青文，陆琦玮. 一种基于自适应局部融合参数的协同过滤方法[J]. 计算机工程, 2014, 40(1): 39-44.
[13]	邓少伟, 罗泽, 李树仁, 阎保平. 基于论文共同作者学术关系的学者推荐系统[J]. 计算机工程, 2013, 39(2): 12-17.
[14]	戴立玲, 谢李华, 卢章平, 袁浩. 基于复球面映射的产品形状相似性度量算法[J]. 计算机工程, 2012, 38(18): 258-261.
[15]	李红艳, 武仲科, 周明全, 武广艳. 基于等测地区域的三维面貌相似度评价方法[J]. 计算机工程, 2012, 38(13): 17-21.

选择文件类型/文献管理软件名称

选择包含的内容

基于属性的文本相似度计算算法改进

Improvement of Text Similarity Computing Algorithm Based on Attribute

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于属性的文本相似度计算算法改进

Improvement of Text Similarity Computing Algorithm Based on Attribute

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价