Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering

Previous Articles     Next Articles

Optimization of Digital Signal Transformation Functions in Multicluster VLIW DSP

ZHEN Yang  a,b,c,GU Naijie  a,b,c,YE Hong  a,b,c   

  1. (a.School of Computer Science and Technology; b.Anhui Province Key Laboratory of Computing and Communication Software; c.Institute of Advanced Technology,University of Science and Technology of China,Hefei 230027,China)
  • Received:2015-03-12 Online:2016-03-15 Published:2016-03-15

数字信号变换函数在多簇VLIW DSP上的优化

甄扬a,b,c,顾乃杰a,b,c,叶鸿a,b,c   

  1. (中国科学技术大学a.计算机科学与技术学院; b.安徽省计算与通信软件重点实验室; c.先进技术研究院,合肥 230027)
  • 作者简介:甄扬(1991-),男,硕士研究生,主研方向为软件并行优化;顾乃杰(通讯作者),教授、博士生导师;叶鸿,博士研究生。
  • 基金资助:

    高等学校学科创新引智计划基金资助项目(B07033);安徽省自然科学基金资助项目“基于GPU集群的深度神经网络并行部署和优化策略研究”(1408085MKL06)。

Abstract:

According to the characteristics of BWDSP100 processor’s architecture,this paper presents several practical ways to improve the performance of digital signal transformation functions in Digital Signal Processor(DSP) function library,including using special assembly instructions,instruction-level reordering,zero-overhead looping instruction,Instruction-level Parallelism(ILP),software vectorization and pipelining.It realizes parallel optimization version in library based on the original order version.Experimental results show that,in four-macro parallel mode,all digital signal transformation functions can achieve 9x speedup,90% functions can achieve 10x speedup,and 11.12x speedup is achieved on average.

Key words: Very Long Instruction Word(VLIW), Single Instruction Multiple Data(SIMD), Digital Signal Processor(DSP), loop unrolling, parallelization, multicluster

摘要:

针对BWDSP100体系结构特点,基于循环展开、指令调度以及软件流水等并行优化技术,结合多簇超长指令架构的特点,通过使用超算硬件指令、零开销循环、指令重新编排与并行等方法对BWDSP100数字信号处理函数库中的函数实施并行化,并基于库中原有的顺序版本实现并行优化版本。实验结果表明,在4宏并行化模式下,所有函数加速比达到9以上,90%的函数加速比超过10,平均加速比为11.12。

关键词: 超长指令字, 单指令流多数据流, 数字信号处理器, 循环展开, 并行化, 多簇

CLC Number: