摘要: 数据集成环境中的全局数据规划方法复杂度很高,且需要经历较长的周期。针对该问题,提出一种基于相似度集合运算的数据库聚类算法,利用自定义的相似数据库、数据库聚类和聚类距离描述数据库的聚类过程,并给出聚类效果的评价方法。实例分析结果证明,该算法简单且具有通用性。
关键词:
数据集成,
数据库相似度,
语义缺失,
数据库聚类,
聚类距离
Abstract: The current methods in the plan of global-data in data integration should abstract a realistic model first, which is very complicated and needs a long period. In order to solve the problem, this paper presents a database clustering algorithm based on similarity. It defines similar database, database clustering and clustering distance, describes the database clustering process and gives evaluation method for clustering effect. Analysis on the case proves that the algorithm is concise and general.
Key words:
data integration,
database similarity,
lack of semantic,
database clustering,
clustering distance
中图分类号:
郑凯, 梁卓明, 郑文栋. 数据集成环境下基于相似度的数据库聚类算法[J]. 计算机工程, 2011, 37(19): 71-72,75.
ZHENG Kai, LIANG Zhuo-Meng, ZHENG Wen-Dong. Database Clustering Algorithm Based on Similarity in Data Integration Environment[J]. Computer Engineering, 2011, 37(19): 71-72,75.