摘要: 数据挖掘是从数据中提取有用知识的过程。在现实生活中,数据丢失的情况是很常见的,尤其是在商业数据库中,由于文件错误、纪录缺失、存储策略的改变等都会引起数据丢失而造成数据库的不完整。这种不完整性会影响关联规则的挖掘过程,因为在有数据缺失时对规则的支持度以及可信度的计算都得不到确定值。把Apriori 算法应用于不完整数据库,基于期望支持度和期望可信度,给出了一个挖掘不完整事务数据库中关联规则的算法。
关键词:
不完整数据库;数据挖掘;关联规则;期望支持度;期望可信度
Abstract: Data mining is the process of discovering knowledge from the data. In real life, missing data often occur in database, especially in business ones. Many factors, such as file errors, missing records and changes in the database schema, will give rise to missing data. This incompleteness will affect the process of discovering association rules because the support and confidence of rules will be uncertain. This paper applies the Apriori algorithm to an incomplete database, and based on expected support and confidence, it proposes an algorithm to mine association rules in an incomplete database. Experiments show that it is an efficient method.
Key words:
Incomplete database; Data mining; Association rules; Expected support; Expected confidence
印 鉴,周祥福,杨敏. 不完整数据库中的数据挖掘[J]. 计算机工程, 2006, 32(12): 34-36.
YIN Jian, ZHOU Xiangfu, YANG Min. Data Mining in Incomplete Database[J]. Computer Engineering, 2006, 32(12): 34-36.