摘要: 传统聚类算法仅考虑属性相似性,较少利用对象间的相互关系。为此,通过关系信息属性化操作,将关系数据转化为关系型属性数据,提出一种针对关系型属性的差异度计算方法。在此基础上,规范化属性变量中的区间和序数变量,将分类变量转变为二态变量,关系变量视为二态变量,提出一种兼顾属性与对象间关系信息的综合差异度计算方法。理论分析和实例结果表明,基于该差异度的聚类准确度更高,聚类结果的实用性更强。
关键词:
数据挖掘,
属性空间聚类,
关系型属性,
综合差异度
Abstract: Traditional clustering method for attribute space ignores the object relationship information. In order to improve it, transforming the relationship into special attribute named relation attribute, a method for computing the dissimilarity for the relation attribute is raised. After changing interval and ordinal variables into standard interval variable, changing categorical variable into binary variable, regarding relation variable as binary variable, a method for computing the synthesized difference degree by considering attribute and object relation information. Theoretical analysis and a clustering example result show that the clustering accuracy degree is higher based on the difference degree, and the clustering result is more practical.
Key words:
data mining,
attribute space clustering,
relational attribute,
synthesized difference degree
中图分类号:
高学东, 吴玲玉, 武森, 谷淑娟. 基于属性与对象关系信息的综合差异度计算[J]. 计算机工程, 2011, 37(22): 35-38.
GAO Hua-Dong, TUN Ling-Yu, WU Sen, GU Chu-Juan. Synthesized Difference Degree Calculation Based on Attribute and Object Relation Information[J]. Computer Engineering, 2011, 37(22): 35-38.