Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering

Previous Articles     Next Articles

Classification Rules Learning Algorithm Based on Selectivity

HE Tian-zhong,ZHOU Zhong-mei,HUANG Zai-xiang   

  1. (Department of Computer Science and Engineering,Minnan Normal University,Zhangzhou 363000,China)
  • Received:2013-10-15 Online:2014-08-15 Published:2014-08-15

基于选择度的分类规则学习算法

何田中,周忠眉,黄再祥   

  1. (闽南师范大学计算机科学与工程系,福建 漳州 363000)
  • 作者简介:何田中(1970-),男,讲师、硕士,主研方向:数据挖掘;周忠眉,教授、博士;黄再祥,讲师、硕士。
  • 基金资助:
    国家自然科学基金资助项目(61170129);福建省自然科学基金资助项目(2013J01259);漳州师范学院基金资助项目(SK 08001)。

Abstract: Many rule-based classifications use single measurement to select the attribute value.Thus,many attribute value pairs have the same measure.It is difficult to distinguish which attribute value pair is the best.Besides,rule-based classification usually extracts 100% confidence rules.So it takes long time to extract these rules.Moreover,the support of these rules is very low.Confronting these problems,this paper proposes a new measure,called selectivity.Selectivity is a multi-measure which includes three measures.So,it can select the best attribute.It develops a new algorithm LRSM which can extract rule based on selectivity.When the number of the negative instance is less than the threshold,LRSM stops the rule extraction.It extracts another rule.Experimental results show that LRSM has high accuracy and decreases consume time.

Key words: data mining, classification, FOIL algorithm, LRSM algorithm, deviation, selectivity

摘要: 规则式分类器通常使用单一度量选择属性值,然而单一度量会导致很多属性值具有相同的度量值,从而无法选择出“好”的属性值。此外,规则式分类器通常提取置信度为100%的规则,致使规则提取过程比较费时,并且所得到的规则支持度较低。针对上述不足,提出新的属性值度量——选择度。选择度是基于信息熵、类支持度及偏离度3种度量的结合,能更好地区分属性值的优劣。在此基础上,提出一种基于选择度的分类规则学习算法LRSM。在LRSM算法中,当规则包含的负实例数小于给定域值时,该规则被抽取,删除被此规则覆盖的实例,抽取下一条规则。实验结果表明,与FOIL算法相比较,LRSM算法提高了分类准确率,同时明显地减少了分类所消耗的时间。

关键词: 数据挖掘, 分类, FOIL算法, LRSM算法, 偏离, 选择度

CLC Number: