作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2014, Vol. 40 ›› Issue (12): 45-49. doi: 10.3969/j.issn.1000-3428.2014.12.008

• 先进计算与数据处理 • 上一篇    下一篇

Hadoop下多模式并行分类算法及其应用研究

李玉丹,郑晓薇   

  1. 辽宁师范大学计算机与信息技术学院,辽宁 大连 116081
  • 收稿日期:2013-12-30 修回日期:2014-02-12 出版日期:2014-12-15 发布日期:2015-01-16
  • 作者简介:李玉丹(1988-),女,硕士研究生,主研方向:集群系统,并行计算,云计算;郑晓薇(通讯作者),教授、CCF高级会员。
  • 基金资助:
    国家自然科学基金资助项目(61373127)。

Research on Multi-mode Parallel Classification Algorithm Under Hadoop and Its Application

LI Yudan,ZHENG Xiaowei   

  1. College of Computer and Information Technology,Liaoning Normal University,Dalian 116081,China
  • Received:2013-12-30 Revised:2014-02-12 Online:2014-12-15 Published:2015-01-16

摘要: 根据人工神经网络自组织、高度并行以及具有非线性映射能力的特点,提出一种基于云计算的Hadoop多模式并行分类算法。通过将自组织映射网络与多个并行BP神经网络结合,提高多语义模式中复杂分类问题的学习效率和训练精度。采用Hadoop平台下的MapReduce框架实现算法的并行处理,解决大规模数据样本训练时内存开销大、通信耗时长的问题。实验结果表明,与传统单BP多输出分类算法相比,该算法训练速度更快、分类精度更高,在处理大规模数据集时具有实时和高效的特性。

关键词: Hadoop集群, MapReduce框架, 自组织映射网络, 并行BP神经网络, 多模式分类, 大数据集

Abstract: Based on Back Propagation Neural Network(BPNN) characteristics of self-organized,highly parallel and nonlinear mapping capabilities,this paper presents a multi-mode parallel Self-organizing Mapping Multi-back Propagation Neural Network(SOM-MBP) classification algorithm under Hadoop.It combinies Self-organizing Mapping(SOM) network and BP neural networks to increase the learning efficiency and training accuracy of complex multi-mode parallel classification problems,and uses MapReduce framework on Hadoop to implement parallel processing in order to solve large memory overhead and communication time-consuming problems which are caused by large-scale data training.Experimental results indicate that the algorithm achieves a faster training speed and higher classification accuracy than traditional single BP multi-output classification algorithm.The parallel algorithm exhibits characteristics of real-time and high efficiency in dealing with large-scale data set.

Key words: Hadoop cluster, MapReduce frame, Self-organizing Mapping(SOM) network, parallel Back Propagation Neural Network(BPNN), multi-mode classification, large dataset

中图分类号: