计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于关系选择的多关系朴素贝叶斯分类

毕佳佳,张晶   

  1. (合肥工业大学计算机与信息学院,合肥230009)
  • 收稿日期:2015-03-25 出版日期:2016-05-15 发布日期:2016-05-13
  • 作者简介:毕佳佳(1989-),女,硕士研究生,主研方向为数据挖掘;张晶,副教授。
  • 基金项目:
    国家自然科学基金资助项目(61273292,61305063)。

Multi-relational Naive Bayesian Classification Based on Relation Selection

BI Jiajia,ZHANG Jing   

  1. (School of Computer and Information,Hefei University of Technology,Hefei 230009,China)
  • Received:2015-03-25 Online:2016-05-15 Published:2016-05-13

摘要: 依据多关系数据库中的背景表对分类任务具有的不同大小贡献度,提出一种基于关系选择的多关系朴素贝叶斯分类算法。对关系表进行两轮删减,根据最大信息增益率删掉部分对分类影响较小的关系表,把平均信息增益率作为衡量表对分类的贡献度,根据贡献度选定余下的表用于最终的分类。实验结果表明,该算法能有效提高分类准确率,相比Graph-NB算法、Classify_tables算法及MRNBC-W算法分别提高2.2%,1.1%,0.86%。

关键词: 数据挖掘, 多关系, 分类, 信息增益率, 贡献度, 关系选择

Abstract: Aiming at the problem that background tables have different contribution degree to classification tasks in relation database,this paper proposes a multi-relational naive Bayesian classification based on relation selection.It conducts two rounds of cuts for multi-relational tables,removes part of the relations with little contribution to classification according to their maximum information gain ratio,defines the average information gain ratio as the measure of the contribution of the table,and selects the final relations among the rest relations for classification according to their contribution.Experimental results show that this algorithm can improve classification accuracy effectively.Compared with Graph-NB,Classify_tables and MNBC-W algorithms,the average accuracy rates are improved by 2.2%,1.1%,0.86%.

Key words: data mining, multi-relation, classification, information gain ratio, contribution degree, relation selection

中图分类号: