作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2011, Vol. 37 ›› Issue (01): 4-6. doi: 10.3969/j.issn.1000-3428.2011.01.002

• 博士论文 • 上一篇    下一篇

基于MLE与流形学习的数据可视化方法

邹 健1,2,刘传才1   

  1. (1. 南京理工大学计算机学院,南京 210094;2. 安徽工程大学应用数理学院,安徽 芜湖 241000)
  • 出版日期:2011-01-05 发布日期:2010-12-31
  • 作者简介:邹 健(1968-),男,博士研究生,主研方向:模式识别,信息统计,数据可视化;刘传才,教授、博士、博士生导师
  • 基金资助:
    国家自然科学基金资助项目(9082004);国家“863”计划基金资助项目(2006AA04Z238);安徽自然科学基金资助项目(KJ 2007B056)

Data Visualization Method Based on MLE and Manifold Learning

ZOU Jian 1,2, LIU Chuan-cai 1   

  1. (1. College of Computer, Nanjing University of Science and Technology, Nanjing 210094, China; 2. School of Applied Mathematics and Physics, Anhui Polytechnic University, Wuhu 241000, China)
  • Online:2011-01-05 Published:2010-12-31

摘要: 在一个给定的样本空间划分下,每个数据集是一个潜在的多项分布的抽样假设。通过对模型参数的最大似然估计,数据集的潜在分布近似于一个离散化的经验分布。根据推广的多项分布族的Fisher度量,潜在分布的信息差异可近似为经验分布间的差异,为基于MLE嵌入得到的信息流形上非监督学习创造了条件。当约简空间的维数为2或3时,原数据集之间的自然可分性可通过降维数据展现出来。实验结果表明,该方法能应用到大样本数据集或彩色图像等高维结构化数据的可视化。

关键词: 多项分布, 最大似然估计, 流形学习, 数据可视化

Abstract: The method is stemmed from the assumption that each data set is a probabilistic realization of an underlying multinomial distribution under a partition on sample space. With the MLE of model parameters, the underlying distribution of a data set can be approximated by a discretized probability distribution. With the generalized Fisher metric on multinomial manifold with boundary, the information divergence between underlying models can be approximated by the corresponding divergence between estimated distributions, it provides the necessary element for unsupervised learning on information manifold. The natural separation of original data sets can be visualized when the dimension of reduced space is two or three. Experimental result shows that the method can be applied to visualization of big sample data sets or color image data sets.

Key words: multinomial distribution, maximum likelihood estimation, manifold learning, data visualization

中图分类号: