计算机工程 ›› 2018, Vol. 44 ›› Issue (6): 18-23.doi: 10.19678/j.issn.1000-3428.0047207

• 先进计算与数据处理 • 上一篇    下一篇

基于属性关联的朴素贝叶斯分类算法

宁可 1,孙同晶 1,赵浩强 2   

  1. 1.杭州电子科技大学 自动化学院,杭州 310018; 2.浙江省电子信息产品检验所,杭州 310007
  • 收稿日期:2017-05-15 出版日期:2018-06-15 发布日期:2018-06-15
  • 作者简介:宁可(1992—),男,硕士研究生,主研方向为海量数据挖掘;孙同晶,副教授、博士;赵浩强,工程师。
  • 基金项目:

    浙江省信息安全重点实验室基金(KYZ066816004)。

Naive Bayesian Classification Algorithm Based on Attribute Association

NING Ke  1,SUN Tongjing  1,ZHAO Haoqiang  2   

  1. 1.College of Automation,Hangzhou Dianzi University,Hangzhou 310018,China;2.Zhejiang Electronic Information Products Testing Institute,Hangzhou 310007,China
  • Received:2017-05-15 Online:2018-06-15 Published:2018-06-15

摘要:

针对传统朴素贝叶斯分类算法处理多维连续型数据时准确率较低的问题,提出基于属性关联的改进算法。通过高斯分割对属性类别不同的多维连续型数据集进行离散化处理,并使用拉普拉斯校准、属性关联和属性加权方法改进朴素贝叶斯分类过程。实验结果表明,与基于拉普拉斯校准或属性加权的改进算法相比,该算法能够提高分类准确率,且提升幅度在一定范围内随着属性数量的增加而增加,适用于多维连续型数据的分类。

关键词: 连续型数据, 数据分类, 关联规则, 朴素贝叶斯分类算法, 属性加权

Abstract:

Aiming at the problem that the accuracy of the multi-dimensional continuous data is too low for traditional naive Bayesian classification algorithm,an improved classification algorithm based on attribute association is proposed.Directed against the multidimensional continuous data set with different attribute classes,it discretizes the data set by Gaussian segmentation,which is improved by using Laplace calibration,attribute association and weighted attribute.Experimental results show that,compared with improved algorithms by Laplace calibration or attribute weighting,the proposed algorithm can improve the accuracy of classification results,and its amplitude increase is increased with the increase of the number of attributes in a certain range,which is suitable for the classification of multidimensional continuous data.

Key words: continuous data, data classification, association rule, naive Bayesian classification algorithm, attribute weighting

中图分类号: