作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2012, Vol. 38 ›› Issue (19): 167-169,174. doi: 10.3969/j.issn.1000-3428.2012.19.043

• 人工智能及识别技术 • 上一篇    下一篇

一种基于置信度的代表点选择算法

黄 云1,2,洪佳明2,覃遵跃1,2   

  1. (1. 吉首大学软件学院,湖南 张家界 427000;2. 中山大学信息科学与技术学院,广州 510006)
  • 收稿日期:2012-01-09 出版日期:2012-10-05 发布日期:2012-09-29
  • 作者简介:黄 云(1976-),男,讲师、博士研究生;主研方向:数据挖掘,人工智能;洪佳明,博士研究生;覃遵跃,副教授、博士研究生

An Algorithm of Representative Point Selection Based on Confidence

HUANG Yun 1,2, HONG Jia-ming 2, QIN Zun-yue 1,2   

  1. (1. School of Software, Jishou University, Zhangjiajie 427000, China; 2. School of Information Science and Technology, Sun Yat-Sen University, Guangzhou 510006, China)
  • Received:2012-01-09 Online:2012-10-05 Published:2012-09-29

摘要: 代表点选择是实现缩减数据集规模的有效途径,可以提高分类的准确率和执行效率。为此,通过引入分类置信度熵的概念,提出适应度评价函数,用于评估代表点的选择效果,以此找到最优的代表点集。该方法可与其他代表点选择方法结合,得到性能更优的代表点选择方法。与多个经典代表点选择方法进行实验比较,结果表明基于置信度的代表点选择方法在分类准确率和数据降低率上有一定优势。

关键词: 置信度熵, 适应度评价函数, 代表点选择, k最近邻, 半监督学习, 遗传算法

Abstract: Representative point selection method aims to reduce the amount of training data instances for nearest neighbor classification algorithms, in order to improve the implementation efficiency and the classification accuracy. By introducing the concept of classification confidence entropy, a new fitness evaluation function is proposed to evaluate the prototype instances, and a new genetic algorithm is designed for representative point selection. This paper demonstrates that the new concept can also be used in other kind of representative point selection methods, in order to improve their performances. Compared with some other famous representative point selection algorithms, experimental results show that confidence based approach has some advantages in improving both the classification accuracy and the data reduction rate.

Key words: confidence entropy, fitness evaluation function, representative point selection, k-nearest neighbor, semi-supervised learning, genetic algorithm

中图分类号: