摘要: 传统汉语句子相似度计算算法在处理大量专业词汇时准确率较低。为此,提出一种基于动态规划的汉语句子相似度算法。通过获取2个句子的公共子串集合,结合链表消重机制,从集合中获取2个句子的所有最长公共子串,并以此计算相似度。实验结果表明,对于含有大量专有名词的问题集合,该算法的测试正确率达93.6%,计算效率较高。
关键词:
句子相似度,
动态规划,
自动问答,
最长公共子串,
消重链表
Abstract: Traditional Chinese sentence computing algorithm has a lower accuracy in dealing with a large number of professional vocabulary. In order to solve this problem, this paper proposes a Chinese sentence similarity algorithm based on dynamic programming. By getting the common sub-string collection of two sentences, it combines the mechanism for duplicate elimination by linked list, and obtains all of the longest common sub-string of two sentences for computing similarity. Experimental results show that for the problem sets which contain a lot of proper nouns, the test accuracy of this algorithm is 93.6%, and has high computational efficiency.
Key words:
sentence similarity,
dynamic programming,
automatic question-answer,
longest common substring,
duplicate elimination linked list
中图分类号:
冯凯, 王小华, 谌志群. 基于动态规划的汉语句子相似度算法[J]. 计算机工程, 2013, 39(2): 220-224.
FENG Kai, WANG Xiao-Hua, CHEN Zhi-Qun. Chinese Sentence Similarity Algorithm Based on Dynamic Programming[J]. Computer Engineering, 2013, 39(2): 220-224.