计算机工程 ›› 2019, Vol. 45 ›› Issue (2): 290-295.doi: 10.19678/j.issn.1000-3428.0049846

• 开发研究与工程应用 • 上一篇    下一篇

基于优化ID3的井漏类型分类算法

李建,付小斌,吴媛媛   

  1. 西南石油大学 计算机科学学院,成都 610500
  • 收稿日期:2017-12-26 出版日期:2019-02-15 发布日期:2019-02-15
  • 作者简介:李建(1960—),男,教授,主研方向为数据仓库、数据挖掘;付小斌(通信作者)、吴媛媛,硕士研究生。
  • 基金项目:

    国家科技重大专项(2016ZX05020-006)。

Classification Algorithm of Well Leakage Type Based on Optimized ID3

LI Jian,FU Xiaobin,WU Yuanyuan   

  1. School of Computer Science,Southwest Petroleum University,Chengdu 610500,China
  • Received:2017-12-26 Online:2019-02-15 Published:2019-02-15

摘要:

决策树算法用于井漏分类时,由于井漏数据离散化后多值属性占比较大,且具有多值偏向的缺点,分类效果不理想。为此,提出一种基于改进ID3的AFIV-ID3算法。在ID3的基础上引入属性重要度计算新的信息熵,属性重要度大小由决策者依靠先验或领域知识决定。在信息增益计算中加入关联度函数比,对信息增益值做出修正。AFIV-ID3算法克服了ID3多值偏向的缺点,提高了数据中重要属性的权重,从而提升井漏类型分类精度。4组UCI数据集和真实井漏数据测试结果表明,该算法的分类精度优于ID3和C4.5算法,并能够将人工经验法不稳定的分类精度提高至约72.23%。

关键词: 井漏类型, ID3算法, 关联度函数比, 属性重要度, 多值偏向

Abstract:

When the decision tree algorithm is used in well leakage classification,the classification effect is not satisfactory because of the large proportion of multi-valued attributes after the well leakage data is discretized,and because the algorithm has the shortcoming of multi-value bias.Therefore,an improved AFIV-ID3 algorithm based on ID3 is proposed.On the basis of ID3,attribute importance is introduced to calculate new information entropy.Attribute importance is determined by the decision maker depending on prior knowledge or domain knowledge.The association function ratio is added to the information gain calculation to modify the information gain value.The AFIV-ID3 algorithm overcomes the shortcoming of ID3 multi-value bias,improves the weight of important attributes in the data,and effectively improves the classification accuracy of well leakage type.The test results of four UCI data sets and real well leakage data show that the classification accuracy of this algorithm is better than that of ID3 and C4.5 algorithm,and the unstable classification accuracy of artificial experience method can be improved to about 72.23%.

Key words: well leakage type, ID3 algorithm, association function ratio, attribute importance, multi-value bias

中图分类号: