作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

流形学习算法中邻域大小参数的递增式选取

邵 超,万春红,赵静玉   

  1. (河南财经政法大学计算机与信息工程学院,郑州 450002)
  • 收稿日期:2013-10-22 出版日期:2014-08-15 发布日期:2014-08-15
  • 作者简介:邵 超(1977-),男,副教授、博士,主研方向:机器学习,数据可视化;万春红、赵静玉,讲师。
  • 基金资助:
    国家自然科学基金资助项目(61202285);河南省基础与前沿技术研究基金资助项目(112300410201);河南省教育厅科学技术研究重点基础研究计划基金资助项目(13B520899)。

Incremental Selection of Neighborhood Size Parameter for Manifold Learning Algorithms

SHAO Chao,WAN Chun-hong,ZHAO Jing-yu   

  1. (School of Computer and Information Engineering,Henan University of Economics and Law,Zhengzhou 450002,China)
  • Received:2013-10-22 Online:2014-08-15 Published:2014-08-15

摘要: 流形学习算法能否成功应用依赖于邻域大小参数的选取是否合适,但该参数在实际中通常难以高效选取。为此,提出一种邻域大小参数的递增式选取方法。按照流形的局部欧氏性,邻域图上的所有邻域都呈线性或近似线性,邻域大小参数若合适,此时所有邻域的线性度量可聚成一类;而邻域大小参数若不合适,邻域图上就会有部分邻域不再线性,其线性度量也不能聚成一类。对邻域图上的每一个邻域执行加权主成分分析,用重建误差对其线性程度进行度量,并计算相应的贝叶斯信息准则,以探测其聚类个数,从而实现对邻域大小参数的递增式选取。实验结果表明,该方法无需任何额外参数,具有较高的运行效率。

关键词: 流形学习, 邻域大小, 局部欧氏性, 加权主成分分析, 重建误差, 贝叶斯信息准则

Abstract: The success of manifold learning algorithms depends greatly upon selecting a suitable neighborhood size parameter,however,it is an open problem how to do this efficiently.To solve this problem,this paper proposes an efficient method to incrementally select a suitable neighborhood size.According to the local Euclidean property of the manifold,that all the neighborhoods in the neighborhood graph are linear or almost linear is the basis to think the corresponding neighborhood size suitable,when their linearity measures can remain small and fall into one cluster.However,once the neighborhood size becomes unsuitable,some neighborhoods are nonlinear,and their linearity measures can not fall into one cluster any more.So,this method runs the weighted Principal Component Analysis(PCA) on each neighborhood in the neighborhood graph,to obtain its reconstruction error as its linearity measure,and computes the corresponding Bayesian Information Criterion(BIC) to detect the number of clusters of all the reconstruction errors in the neighborhood graph,by which the neighborhood size can be selected incrementally.Experimental results that this method does not require any extra parameter,and has high run efficiency. 

Key words: manifold learning, neighborhood size, local Euclidean property, weighted Principal Component Analysis(PCA), reconstruction error, Bayesian Information Criterion(BIC)

中图分类号: