计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于样本抽样和权重调整的SWA-Adaboost 算法

高敬阳,赵 彦   

  1. (北京化工大学信息科学与技术学院,北京100029)
  • 收稿日期:2013-09-02 出版日期:2014-09-15 发布日期:2014-09-12
  • 作者简介:高敬阳(1966 - ),女,副教授、博士,主研方向:人工智能,模式识别;赵 彦,硕士研究生。
  • 基金项目:
    国家自然科学基金资助项目(51275030)。

SWA-Adaboost Algorithm Based on Sampling and Weight Adjustment

GAO Jing-yang,ZHAO Yan   

  1. (College of Information Science and Technology,Beijing University of Chemical Technology,Beijing 100029,China)
  • Received:2013-09-02 Online:2014-09-15 Published:2014-09-12

摘要: 根据分类算法是依据样本区分度进行分类的原理,提出增加样本属性以提高样本区分度的方法,在样本预处理阶段对所有样本增加一个属性值dmin以加强样本之间的区分度。针对原始Adaboost 算法在抽样阶段由于抽样不均而导致对某些类训练不足的问题,采用均衡抽样方法,保证在抽样阶段所抽取的不同类样本的数量比例不变。针对原始算法样本权重增长过快的问题,给出新的权重调整策略,引入样本错分计数量count(n),有效地抑制样本权重增长速度。给出一种改进的Adaboost 算法,即SWA-Adaboost 算法,并采用美国加州大学机器学习UCI数据库中6 种数据集的数据对改进算法与原始算法进行实验对比,结果证明,改进算法SWA-Adaboost 在泛化性能上优于Adaboost 算法,泛化误差平均降低9. 54% 。

关键词: 样本预处理, 均衡抽样, 权重调整, 泛化性能, 类中心最小距离, 样本区分度

Abstract: Because the classification algorithm based on the differences among samples,a new method is proposed which adds a new property value dmin into each sample in order to increase the differences. Besides,according to the situation that samples belonging to different classes are sampled unevenly in the sampling phase,a new method called even sampling is proposed to keep the proportion of difference classes invariant. For the purpose of inhibition of the increment speed of misclassification samples,a new method is proposed which brings in a variable count(n)to record the times of misclassification. In the word,an improved algorithm called Sampling equilibrium & Weight adjustment & Add attribute Adaboost ( SWA-Adaboost ) is proposed. Using the 6 datasets belonging to machine learning database of University of California in USA,the paper runs experiments to compare the original Adaboost with SWA-Adaboost. Experimental results show that SWA-Adaboost has better generalization performance than the original Adaboost and the average decrease of generalization error is 9. 54% .

Key words: sample preprocessing, even sampling, weight adjustment, generalization performance, minimum distance of class center, different degree of sample

中图分类号: