摘要: 研究数据标准化处理对语义关系相似度计算的影响。从大规模文本语料中提取词法模式,生成词对-词法模式矩阵,利用3种数据标准化方式对矩阵数据进行处理,通过规律学习方法对隐含语义关系相似度进行计算。实验结果表明,不做数据标准化、z-score标准化、范围标准化、熵加权标准化处理的分类准确率分别为0.87、0.89、0.95、0.96。
关键词:
语义关系,
相似度,
词法模式,
词对-词法模式矩阵,
数据标准化,
Web数据挖掘
Abstract: This paper researches the influence of the data standardization for semantic relation similarity calculation. It extracts lexical pattern from huge text corpus, generates the word pair-lexical pattern matrix, employs three methods to standard the original data matrix, and uses law study method to calculate the similarity between relations. Experimental result shows that without any standardization, the classification task with a statistically significant average precision score is 0.87, z-score standardization is 0.89, interval standardization is 0.95, and weighted based on entropy is 0.96.
Key words:
semantic relation,
similarity,
lexical pattern,
word pair-lexical pattern matrix,
data standardization,
Web data mining
中图分类号:
王正鹏, 谢志鹏, 邱培超. 语义关系相似度计算中的数据标准化方法比较[J]. 计算机工程, 2012, 38(10): 38-40.
WANG Zheng-Feng, XIE Zhi-Feng, QIU Pei-Chao. Comparison of Data Standardization Method in Semantic Relation Similarity Calculation[J]. Computer Engineering, 2012, 38(10): 38-40.