Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2011, Vol. 37 ›› Issue (5): 184-186. doi: 10.3969/j.issn.1000-3428.2011.05.062

• Networks and Communications • Previous Articles     Next Articles

Denoising and Sample Reduction for Large-scale Sample Set Based on Distance of Nearest Neighbors

CHEN Sheng-bing 1, LI Long-shu 2   

  1. (1. Department of Computer Science and Technology, Hefei University, Hefei 230601, China; 2. Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei 230039, China)
  • Online:2011-03-05 Published:2012-10-31

基于近邻距离的大规模样本集去噪与减样

陈圣兵1,李龙澍2   

  1. (1. 合肥学院计算机科学与技术系,合肥 230601;2. 安徽大学计算智能与信号处理教育部重点实验室,合肥 230039)
  • 作者简介:陈圣兵(1973-),男,博士研究生,主研方向:机器学习算法,人工智能;李龙澍,教授、博士生导师
  • 基金资助:
    国家自然科学基金资助项目(60273043);安徽省自然科学基金资助项目(090412054)

Abstract: Based on the analysis of the limitation of traditional sample reduction method, a new distance model is proposed, and the measurement method of intra-class distance and inter-class distance of samples is given. By using the new distance mode, the methods of noise identification and importance evaluation are described, and the algorithm of training sample reduction is proposed. The algorithm removes noise samples. According to the similarity of sample, inter-class distance of sample and the number of deleted samples around, the algorithm removes lesser important training samples from the original sample space directly. Simulation results show that the distance model has lesser contingency and better anti-noise ability, and the reduction performance of the algorithm is better than traditional methods.

Key words: support vector, denoising, sample reduction, large-scale sample set

摘要: 在分析传统样本缩减方法局限性的基础上,提出一种距离模型及样本的类内距离和类间距离的度量方法。给出利用该距离模型进行噪声识别和样本重要性评价方法及训练样本的缩减算法。该算法剔除噪声样本,根据样本相似性、类间距离和周围被剔除样本的数目,直接从原始样本空间剔除次要样本。仿真结果表明,该距离模型偶然性小,抗噪能力强,缩减效果优于传统的样本缩减方法。

关键词: 支持向量, 去噪, 减样, 大规模样本集

CLC Number: