Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2022, Vol. 48 ›› Issue (6): 193-199. doi: 10.19678/j.issn.1000-3428.0061903

• Advanced Computing and Data Processing • Previous Articles     Next Articles

Optimized Realization of Sobel Edge Detection Algorithm for FT-M7002

FAN Mingliang1, GUO Zihan1, CHAI Xiaonan1, SHANG Jiandong2   

  1. 1. School of Information Engineering, Zhengzhou University, Zhengzhou 450001, China;
    2. National Supercomputing Center in Zhengzhou, Zhengzhou 450001, China
  • Received:2021-06-11 Revised:2021-07-14 Published:2021-07-19

面向FT-M7002的Sobel边缘检测算法优化实现

范明亮1, 郭子涵1, 柴晓楠1, 商建东2   

  1. 1. 郑州大学 信息工程学院, 郑州 450001;
    2. 国家超级计算郑州中心, 郑州 450001
  • 作者简介:范明亮(1997—),男,硕士研究生,主研方向为高性能计算、图像处理;郭子涵、柴晓楠,硕士研究生;商建东,教授、博士、博士生导师。
  • 基金资助:
    国家重点研发计划子课题“全球对地观测成果管理及共享服务系统关键技术研究”(2018YFB0505000)。

Abstract: Edge detection is a robust image analysis method used in image processing and computer vision.The Sobel operator is widely used in edge detection and image processing.With the development of domestic FT series high-performance Digital Signal Processors (DSP), the demand for FT platforms in image processing is increasing.Moreover, it is urgent to implement high-performance image-processing algorithms for FT platforms.The vector parallel optimization of the Sobel edge detection algorithm was performed on the FT-M7002 platform to solve the above problem.Single Instruction Multiple Data (SIMD) instructions embedded in the FT-M7002 processor were used to mine the data-level parallelism in the Sobel edge detection algorithm.In addition, a parallel conversion interface between the character and integer data was designed and implemented.The loop unrolling optimization method was used to improve the number of instruction beats, and the problem of discontinuous data access and memory was solved using Direct Memory Access (DMA) matrix transposition.Double buffer technology was used to achieve parallel data transmission and kernel computing to eliminate the time gap between data transmission and computing.The performance of the original Sobel algorithm and the optimization algorithm under various convolution kernel sizes and picture sizes were compared and analyzed.The results showed that compared with the original algorithm, the optimization algorithm could achieve an acceleration ratio of 1.66~3.14 times.Compared with the operation results obtained using the TMS320C6678 processor, the optimization algorithm could achieve an acceleration effect of 1.87~2.08 times on the FT-M7002 platform.

Key words: edge detection, Sobel operator, high-performance Digital Signal Processor(DSP), vector parallel, loop unrolling

摘要: 边缘检测是图像处理与计算机视觉领域中一种重要的图像分析方法,Sobel算子常用于粗精度的边缘提取,在图像边缘检测中被广泛应用。随着国产飞腾(FT)系列高性能数字信号处理器的发展,图像处理领域对FT平台的需求日益提高,同时急需实现面向FT平台的高性能图像处理算法。针对上述问题,在FT-M7002平台上对Sobel边缘检测算法进行向量并行优化,使用FT-M7002处理器内嵌SIMD指令,挖掘Sobel边缘检测算法中的数据级并行性,同时设计并实现一种字符型与整型数据间的并行化转换接口,使用循环展开优化方法提升指令节拍数,通过DMA矩阵转置解决数据访存不连续的问题。采用双缓冲技术实现数据传输与内核计算的并行,从而隐藏数据传输与计算之间的时间间隙。对比分析多种卷积核大小及图片规模下原Sobel算法与优化算法的性能,结果表明,与原始算法相比,该优化算法能取得1.66~3.14倍的加速比,此外,相较TMS320C6678处理器上的运行结果,在FT-M7002平台上优化算法可达到1.87~2.08倍的加速效果。

关键词: 边缘检测, Sobel算子, 高性能数字信号处理器, 向量并行, 循环展开

CLC Number: