Abstract:
This paper discusses the influence of three data normalization methods on the performance of K-Nearest Neighbor(KNN) classifier. The simulation results on the 12 real-life benchmark datasets and 1 artificial dataset show that on most datasets, the data normalization methods can enhance the recognition rate of KNN classifier. Motivated by these results, it explores why the data normalization methods work and presents a rule to indicate when the data normalization method is applied on the dataset according to the distribution characteristic of data.
Key words:
K-Nearest Neighbor(KNN) classifier,
data normalization method,
Euclidian distance
摘要: 讨论最小-最大规范化、z-score规范化及小数定标规范化3种方法对K近邻分类器性能的影响,在12个标准UCI真实数据集和1个人工数据集上进行实验比较。实验结果表明,规范化方法在大部分数据集能上提高K近邻分类器的识别率。针对实验结果研究据规范化方法提升分类器性能的内在原因,给出根据数据属性的数值分布特点决定是否使用数据规范化方法的一般准则。
关键词:
K近邻分类器,
数据规范化方法,
欧式距离
CLC Number:
CA Wei-Ling, CHEN Dong-Xia. Influence of Data Normalization Methods on K-Nearest Neighbor Classifier[J]. Computer Engineering, 2010, 36(22): 175-177.
蔡维玲, 陈东霞. 数据规范化方法对K近邻分类器的影响[J]. 计算机工程, 2010, 36(22): 175-177.