计算机工程 ›› 2013, Vol. 39 ›› Issue (2): 220-224.doi: 10.3969/j.issn.1000-3428.2013.02.045

• 人工智能及识别技术 • 上一篇    下一篇

基于动态规划的汉语句子相似度算法

冯 凯,王小华,谌志群   

  1. (杭州电子科技大学计算机应用技术研究所,杭州 310018)
  • 收稿日期:2012-03-12 修回日期:2012-05-21 出版日期:2013-02-15 发布日期:2013-02-13
  • 作者简介:冯 凯(1986-),男,硕士研究生,主研方向:中文信息处理;王小华,教授;谌志群,副教授
  • 基金项目:
    国家自然科学基金资助项目(61103101);教育部人文社会科学研究基金资助项目(12YJCZH201)

Chinese Sentence Similarity Algorithm Based on Dynamic Programming

FENG Kai, WANG Xiao-hua, CHEN Zhi-qun   

  1. (Institute of Computer Application Technology, Hangzhou Dianzi University, Hangzhou 310018, China)
  • Received:2012-03-12 Revised:2012-05-21 Online:2013-02-15 Published:2013-02-13

摘要: 传统汉语句子相似度计算算法在处理大量专业词汇时准确率较低。为此,提出一种基于动态规划的汉语句子相似度算法。通过获取2个句子的公共子串集合,结合链表消重机制,从集合中获取2个句子的所有最长公共子串,并以此计算相似度。实验结果表明,对于含有大量专有名词的问题集合,该算法的测试正确率达93.6%,计算效率较高。

关键词: 句子相似度, 动态规划, 自动问答, 最长公共子串, 消重链表

Abstract: Traditional Chinese sentence computing algorithm has a lower accuracy in dealing with a large number of professional vocabulary. In order to solve this problem, this paper proposes a Chinese sentence similarity algorithm based on dynamic programming. By getting the common sub-string collection of two sentences, it combines the mechanism for duplicate elimination by linked list, and obtains all of the longest common sub-string of two sentences for computing similarity. Experimental results show that for the problem sets which contain a lot of proper nouns, the test accuracy of this algorithm is 93.6%, and has high computational efficiency.

Key words: sentence similarity, dynamic programming, automatic question-answer, longest common substring, duplicate elimination linked list

中图分类号: