作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2012, Vol. 38 ›› Issue (10): 38-40. doi: 10.3969/j.issn.1000-3428.2012.10.010

• 软件技术与数据库 • 上一篇    下一篇

语义关系相似度计算中的数据标准化方法比较

王正鹏,谢志鹏,邱培超   

  1. (复旦大学计算机科学技术学院,上海 201203)
  • 收稿日期:2011-07-20 出版日期:2012-05-20 发布日期:2012-05-20
  • 作者简介:王正鹏(1987-),男,硕士研究生,主研方向:Web数据挖掘;谢志鹏,副教授;邱培超,硕士研究生

Comparison of Data Standardization Method in Semantic Relation Similarity Calculation

WANG Zheng-peng, XIE Zhi-peng, QIU Pei-chao   

  1. (School of Computer Science, Fudan University, Shanghai 201203, China)
  • Received:2011-07-20 Online:2012-05-20 Published:2012-05-20

摘要: 研究数据标准化处理对语义关系相似度计算的影响。从大规模文本语料中提取词法模式,生成词对-词法模式矩阵,利用3种数据标准化方式对矩阵数据进行处理,通过规律学习方法对隐含语义关系相似度进行计算。实验结果表明,不做数据标准化、z-score标准化、范围标准化、熵加权标准化处理的分类准确率分别为0.87、0.89、0.95、0.96。

关键词: 语义关系, 相似度, 词法模式, 词对-词法模式矩阵, 数据标准化, Web数据挖掘

Abstract: This paper researches the influence of the data standardization for semantic relation similarity calculation. It extracts lexical pattern from huge text corpus, generates the word pair-lexical pattern matrix, employs three methods to standard the original data matrix, and uses law study method to calculate the similarity between relations. Experimental result shows that without any standardization, the classification task with a statistically significant average precision score is 0.87, z-score standardization is 0.89, interval standardization is 0.95, and weighted based on entropy is 0.96.

Key words: semantic relation, similarity, lexical pattern, word pair-lexical pattern matrix, data standardization, Web data mining

中图分类号: