作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (5): 286-294. doi: 10.19678/j.issn.1000-3428.0063954

• 开发研究与工程应用 • 上一篇    下一篇

面向申威架构的KNN并行算法实现与优化

王其涵1, 庞建民1, 岳峰1, 祝迪1, 沈莉2, 肖谦3   

  1. 1. 信息工程大学 数学工程与先进计算国家重点实验室, 郑州 450000;
    2. 中国科学技术大学, 合肥 230000;
    3. 江南计算技术研究所, 江苏 无锡 214000
  • 收稿日期:2022-02-17 修回日期:2022-04-20 发布日期:2022-05-03
  • 作者简介:王其涵(1996-),女,硕士研究生,主研方向为高性能计算;庞建民,教授、博士、博士生导师;岳峰(通信作者),讲师、博士;祝迪,硕士研究生;沈莉,副研究员、博士研究生;肖
  • 基金资助:
    国家自然科学基金“基于深度学习与计算语言学的恶意代码作者身份识别研究”(61802433)。

Implementation and Optimization of Parallel KNN Algorithm for Sunway Architecture

WANG Qihan1, PANG Jianmin1, YUE Feng1, ZHU Di1, SHEN Li2, XIAO Qian3   

  1. 1. State Key Laboratory of Mathematical Engineering and Advanced Computing, Information Engineering University, Zhengzhou 450000, China;
    2. University of Science and Technology of China, Hefei 230000, China;
    3. Jiangnan Institute of Computing Technology, Wuxi 214000, Jiangsu, China
  • Received:2022-02-17 Revised:2022-04-20 Published:2022-05-03

摘要: K近邻(KNN)是人工智能中最常用的分类算法,其性能提升对于海量数据的整理分析、大数据分类等任务具有重要意义。目前新一代神威超级计算机正处于应用发展的初始阶段,结合新一代申威异构众核处理器的结构特性,充分利用庞大的计算资源实现高效的KNN算法是海量数据分析整理的现实需求。根据SW26010pro处理器的结构特性,采用主从加速编程模型实现一种基础版本的KNN并行算法,其将计算核心传输到从核上,实现了线程级并行。分析影响基础并行算法性能的关键因素并提出优化算法SWKNN,不同于基础并行KNN算法的任务划分方式,SWKNN采用任务重划分策略,以避免冗余计算开销。通过数据流水优化、从核间通信优化、二次负载均衡优化等步骤减少不必要的通信开销,从而有效缓解访存压力并进一步提升算法性能。实验结果表明,与串行KNN算法相比,面向申威架构的基础并行KNN算法在SW26010pro处理器的单核组上可以获得最高48倍的加速效果,在同等数据规模下,SWKNN算法较基础并行KNN算法又可以获得最高399倍的加速效果。

关键词: 异构众核处理器, K近邻算法, 并行计算, 算法优化, 分类性能

Abstract: The K-Nearest Neighbor(KNN) algorithm is the most typically used classification algorithm in artificial intelligence,and its performance improvement significantly affects the sorting and analysis of massive data and big data classification.The current new generation of Sunway supercomputers is in the initial stage of application development. Exploiting the structural characteristics of the new-generation Sunway heterogeneous many-core processors allows an efficient KNN algorithm to be achieved for massive data analysis and collation.In this study,based on the structural characteristics of the SW26010pro processor,the master-slave acceleration programming model is used to implement the basic version of the KNN parallel algorithm,which transfers the computing core to the slave core for thread-level parallelism.Subsequently,the key factors affecting the performance of the basic parallel algorithm are analyzed,and the SWKNN algorithm is proposed,which is different from the task-division method of the basic parallel KNN algorithm. Finally,unnecessary communication overhead is reduced through data pipelining optimization,intercore communication optimization,and secondary load balancing optimization,which effectively relieves memory access pressure and further improves the algorithm performance.The experimental results show that,compared with the serial KNN algorithm,the basic parallel KNN algorithm for the Sunway architecture can achieve a maximum speedup that is 48 times higher on the single-core group of the SW26010pro processor.At the same scale,the SWKNN can achieve a speedup that is 399 times higher than that of the basic parallel KNN algorithm.

Key words: heterogeneous many-core processors, K-Nearest Neighbor(KNN) algorithm, parallel computing, algorithm optimization, classification performance

中图分类号: