Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2021, Vol. 47 ›› Issue (7): 37-43. doi: 10.19678/j.issn.1000-3428.0059943

• Research Hotspots and Reviews • Previous Articles     Next Articles

Implementation and Optimization of Canny Edge Detection Algorithm on FT Platform

GUO Hengliang1, CHAI Xiaonan2, HAN Lin1, HE Xiaohui3, SHANG Jiandong1   

  1. 1. Henan Province Supercomputing Center, Zhengzhou University, Zhengzhou 450000, China;
    2. School of Information Engineering, Zhengzhou University, Zhengzhou 450000, China;
    3. School of Earth Science and Technology, Zhengzhou University, Zhengzhou 450000, China
  • Received:2020-11-09 Revised:2020-12-10 Published:2020-12-15

Canny边缘检测算法在飞腾平台上的实现与优化

郭恒亮1, 柴晓楠2, 韩林1, 赫晓慧3, 商建东1   

  1. 1. 郑州大学 河南省超级计算中心, 郑州 450000;
    2. 郑州大学 信息工程学院, 郑州 450000;
    3. 郑州大学 地球科学与技术学院, 郑州 450000
  • 作者简介:郭恒亮(1971-),男,副教授,主研方向为智慧城市、地理信息系统、高性能计算;柴晓楠,硕士研究生;韩林(通信作者),副教授;赫晓慧、商建东,教授。
  • 基金资助:
    国家重点研发计划(2018YFB0505000)。

Abstract: In order to support the underlying image library on the FT DSP platform,and reduce the time consumed by the calculation in the Canny edge detection algorithm,an algorithm for parallel Canny gradient computing based on FT-M7002 is proposed.On the basis of FT-M7002 high-performance processing architecture,Single Instruction Multiple Data(SIMD) is vectorized to enhance the parallel processing of the instructions of DSP cores.According to the hierarchical structure features of the vector memory of FT-M7002,the mode of data memory access of the Canny parallel gradient computing algorithm is analyzed.The first address offset is used to deal with discontinuous data memory access,and data transmission and data calculation is completed by means of double buffering mode.Experimental results show that when reaching the same detection accuracy as the original Canny algorithm,the proposed algorithm improves the overall running speed by 1.490~2.112 times when the size of convolution core is 3×3,5×5,and 7×7,bridging the performance gap with the mainstream accelerators in digital image processing.

Key words: FT-M7002 processor, Canny edge detection, parallel gradient computing, memory access optimization, double buffering mode

摘要: 为实现国产飞腾DSP平台对底层图像库的支持,针对原始Canny边缘检测算法计算时间过长的问题,设计一种面向FT-M7002平台的Canny梯度计算并行算法。基于FT-M7002高性能处理架构,采用单指令流多数据流向量化方式增强DSP内核指令的并行处理能力,根据FT-M7002平台向量存储器的层次结构特征,分析Canny梯度计算并行算法的访存模式,通过首地址偏移取址解决不连续访存问题,并结合双缓冲方式完成数据传输与数据计算。实验结果表明,在与原始Canny算法具有相同检测精度的情况下,该算法在卷积核大小为3×3、5×5、7×7时整体运行速度提升了1.490~2.112倍,缩小了与主流加速器件在数字图像处理领域的性能差距。

关键词: FT-M7002处理器, Canny边缘检测, 梯度计算并行, 访存优化, 双缓冲方式

CLC Number: