参考文献
[1]高伟,赵荣彩,韩林,等.SIMD自动向量化编译优化概述[J].软件学报,2015,26(6):1265-1284.
[2]Intel Company.Intel C++Compiler[EB/OL].[2016-03-21].http://icc.gnu.org.
[3]Free Software Foudation.GNU Compiler Collection[EB/OL].[2016-05-11].http://gcc.gnu.org.
[4]Open64 Compiler[EB/OL].[2016-05-23].http://open64.sourceforge.net.
[5]魏帅.面向SIMD的向量化算法及重组技术研究[D].郑州:解放军信息工程大学,2012.
[6]Allen R,Kennedy K.Optimizing Compilers for Modern Architectures[M].[S.1.]:Morgan Kaufmann Publishers,2001.
[7]Nuzman D,Zaks A.Outer-loop Vectorization-revisited for Short SIMD Architectures[C]//Proceedings of 2008 International Conference on Parallel Architectures and Compilation Techniques.Toronto,Canada:[s.n.],2008:215-222.
[8]Trifunovic K,Nuzman D,Cohen A,et al.Polyhedral-model Guided Loop-nest Auto-vectorization[C]//Proceedings of 2009 International Conference on Parallel Architectures and Compilation Techniques.Raleigh,USA:[s.n.],2009:326-333.
[9]Kong M,Veras R,Stock K.When Polyhedral Transformations Meet SIMD Code Generation[C]//Proceedings of 2013 Conference on Programming Language Design and Implementation.Washington D.C.,USA:IEEE Press,2013:125-132.
[10]Barik R,Zhao Jisheng,Sarkar V.Efficient Selection of Vector Instructions Using Dynamic Programming[C]//Proceedings of the 43rd Annual IEEE/ACM Inter-national Symposium on Microarchitecture.Atlanta,USA:[s.n.],2010:226-277.
[11]Liu Jun,Zhang Yuanrui,Kandemir M.A Compiler Framework for Extracting Superword Level Parallelism[C]//Proceedings of 2012 Conference on Programming Language Design and Implementation.Beijing,China:[s.n.],2012:163-174.
[12]Karrenberg R,Hack S.Whole-function Vectorization[C]//Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization.Washington D.C.,USA:IEEE Press,2011:222-232.
[13]Shahbahrami A,Juurlink B,Vassiliadis S.Performance Impact of Misaligned Accesses in SIMD Extensions[C]//Proceedings of the 17th Annual Workshop on Circuits,Systems and Signal Processing.Washington D.C.,USA:IEEE Press,2006:334-342.
[14]Eichenberger A E,Wu P,O′Brien K.Vectorization for SIMD Architectures with Alignment Constraints[C]//Proceeding of ACM SIGPLAN 2004 Conference on Programming language Design and Implementation.New York,USA:ACM Press,2004:82-93.
[15]Larsen S,Witchel E,Amarasin S P.Increasing and Detecting Memory Address Congruence[C]//Pro-ceedings of 2002 IEEE International Conference on Parallel Architectures and Compilation Techniques.Washington D.C.,USA:IEEE Press,2002:18-29.
[16]Chang H,Sung W.Efficient Vectorization of SIMD Programs with Non-aligned and Irregular Data Access Hardware[C]//Proceedings of 2008 International Con-ference on Compilers,Architectures and Synthesis for Embedded Systems.New York,USA:ACM Press,2008:167-176.
[17]Bik A J C,Girkar M,Grey P M,et al.Automatic Intra-register Vectorization for the Intel Architecture[J].International Journal of Parallel Programming,2002,32(2):65-98.
[18]Ren Gang,Wu Peng,Padua D A.Optimizing Data Permutations for SIMD Device[C]//Proceedings of 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation.New York,USA:ACM Press,2006:118-131.
[19]Li Yuxiang,Shi Hui,Li Chen.Vectorization-oriented Local Data Regrouping[J].Computer System,2009,30(8):1529-1534.
[20]Nuzman D,Rosen I A.Auto-vectorization of Interleaved Data for SIMD[C]//Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation.New York,USA:ACM Press,2006:132-143.
[21]Huang Libo,Li Chen,Wang Zhiying,et al.SIF:Over-coming the Limitations of SIMD Devices via Implicit Permutation[C]//Proceedings of the 16th International Symposium on High-performance Computer Architec-ture.Washington D.C.,USA:IEEE Press,2010:355-366.
[22]姚远.SIMD自动向量识别及代码调优技术研究[D].郑州:解放军信息工程大学,2012.
[23]肖玮.二维SIMD结构的编译优化与功耗研究[D].上海:复旦大学,2008.
编辑索书志 |