计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于MapReduce框架的并行蚁群优化聚类算法

凌海峰,刘超超   

  1. (合肥工业大学过程优化与智能决策教育部重点实验室,合肥 230009)
  • 收稿日期:2014-07-22 出版日期:2015-08-15 发布日期:2015-08-15
  • 作者简介:凌海峰(1971-),男,副教授、博士,主研方向:智能优化,数据挖掘;刘超超(通讯作者),硕士研究生。
  • 基金项目:
    国家“973”计划基金资助项目(2013CB329603);国家自然科学基金资助项目(71071047);安徽省自然科学基金资助项目(1208085MG120)。

Parallel Ant Colony Optimization Clustering Algorithm Based on MapReduce Framework

LING Haifeng,LIU Chaochao   

  1. (Key Lab of Process Optimization and Intelligent Decision-making,Ministry of Education, Hefei University of Technology,Hefei 230009,China)
  • Received:2014-07-22 Online:2015-08-15 Published:2015-08-15

摘要: 传统蚁群优化聚类算法在处理大规模数据时存在内存不足,不能体现蚁群算法的并行优势,无法处理分布式数据等问题。为此,提出一种并行蚁群优化聚类算法。通过借鉴搜索空间复制和搜索空间分块的思想,解决大数据处理问题,逐行读取信息素和数据,避免当数据规模过 大时,将信息素一次性读入而造成内存不足的风险。实验结果表明,该算法在处理大规模数据时具有较好的可扩展性和较高的加速比。

关键词: 大数据, MapReduce计算框架, 聚类算法, 蚁群, 并行算法

Abstract: Traditional algorithm has to face a number of problems,such as limiting of memory,lacking of parallel advantage,unable to handle distributed datasets.In order to deal with the problems,this paper proposes a parallel Ant Colony Optimization Clustering(ACOC) algorithm.The proposed algorithm solves the problem of big data by referencing the thought of the search space replication approach and the search space partition approach.The algorithm can read pheromone and dataset line-by-line to avoid out of memory when dealing with large datasets.Experimental results demonstrate that the algorithm has good scalability and high speedup when dealing with large-scale data.

Key words: big data, MapReduce computation framework, clustering algorithm, ant colony, parallel algorithm

中图分类号: