1 |
HUANG X F, TANG R, ZHOU Y, et al. DSP-based parallel optimization for real-time video stitching. Journal of Real-Time Image Processing, 2023, 20(2): 28.
doi: 10.1007/s11554-023-01275-x
|
2 |
WANG Y H, LI C, LIU C, et al. Advancing DSP into HPC, AI, and beyond: challenges, mechanisms, and future directions. CCF Transactions on High Performance Computing, 2021, 3(1): 114- 125.
doi: 10.1007/s42514-020-00057-2
|
3 |
KIM W, LEE S, YUN I, et al. Energy-efficient dataflow scheduling of CNN applications for vector-SIMD DSP. IEEE Access, 2022, 10, 86234- 86247.
doi: 10.1109/ACCESS.2022.3197206
|
4 |
HAJIRASSOULIHA A, TABERNER A J, NASH M P, et al. Suitability of recent hardware accelerators(DSPs, FPGAs, and GPUs) for computer vision and image processing algorithms. Signal Processing: Image Communication, 2018, 68, 101- 119.
doi: 10.1016/j.image.2018.07.007
|
5 |
方建滨, 杜琦, 唐滔, 等. 飞腾处理器与商用处理器性能比较. 计算机工程与科学, 2019, 41(1): 1- 8.
doi: 10.3969/j.issn.1007-130X.2019.01.001
|
|
FANG J B, DU Q, TANG T, et al. Performance comparison between FT-1500A and Intel Xeon. Computer Engineering and Science, 2019, 41(1): 1- 8.
doi: 10.3969/j.issn.1007-130X.2019.01.001
|
6 |
HASHEMI B, NAKATSUKASA Y, TREFETHEN L N. Rectangular eigenvalue problems. Advances in Computational Mathematics, 2022, 48(6): 80.
doi: 10.1007/s10444-022-09994-8
|
7 |
ANDERSON E. LAPACK Users' guide. Third ed. [S. l.]: Society for Industrial and Applied Mathematics, 1999.
|
8 |
LENG H N, HE Z Q. Eigenvalue bounds for symmetric matrices with entries in one interval. Applied Mathematics and Computation, 2017, 299, 58- 65.
doi: 10.1016/j.amc.2016.11.035
|
9 |
HERNANDEZ T M, VAN BEEUMEN R, CAPRIO M A, et al. A greedy algorithm for computing eigenvalues of a symmetric matrix with localized eigenvectors. Numerical Linear Algebra with Applications, 2021, 28(2): 1- 16.
|
10 |
刘彦. 基于飞腾2000+的BLAS3函数优化与实现[D]. 长沙: 湖南大学, 2020.
|
|
LIU Y. Optimization and implementation of BLAS3 function based on FT-2000+[D]. Changsha: Hunan University, 2020. (in Chinese)
|
11 |
LIU F F, MA W J, ZHAO Y W, et al. xMath2.0: a high-performance extended math library for SW26010-Pro many-core processor. CCF Transactions on High Performance Computing, 2023, 5(1): 56- 71.
doi: 10.1007/s42514-022-00126-8
|
12 |
吴颖. 基于鲲鹏处理器的LAPACK对称矩阵方程求解例程的性能优化研究[D]. 兰州: 兰州大学, 2022.
|
|
WU Y. Research on performance optimization of LAPACK routines for solving symmetric matrix linear equation based on Kunpeng processor[D]. Lanzhou: Lanzhou University, 2020. (in Chinese)
|
13 |
刘斌斌, 顾乃杰, 任开新, 等. LAPACK线性方程求解函数在龙芯3A上的并行化. 小型微型计算机系统, 2014, 35(5): 1085- 1089.
doi: 10.3969/j.issn.1000-1220.2014.05.028
|
|
LIU B B, GU N J, REN K X, et al. Parallelization of LAPACK linear equation functions based on Loongson 3A. Journal of Chinese Computer Systems, 2014, 35(5): 1085- 1089.
doi: 10.3969/j.issn.1000-1220.2014.05.028
|
14 |
邢克飞, 王跃科, 扈啸. 银河飞腾DSP芯片总剂量辐照试验研究. 半导体技术, 2006, 31(7): 493-494, 505.
doi: 10.3969/j.issn.1003-353X.2006.07.004
|
|
XING K F, WANG Y K, HU X. Total ionizing dose effects test of domastic high quality device YHFT-DSP. Semiconductor Technology, 2006, 31(7): 493-494, 505.
doi: 10.3969/j.issn.1003-353X.2006.07.004
|
15 |
杨琳, 吴家铸, 扈啸, 等. 互相关运算在银河飞腾DSP上的实现及优化. 计算机科学, 2015, 42(11): 53- 55.
|
|
YANG L, WU J Z, HU X, et al. Realization and optimization of cross-correlation based on YHFT-QDSP. Computer Science, 2015, 42(11): 53- 55.
|
16 |
王正行, 曾令将. 基于飞腾M6678的向量数学库优化技术研究. 舰船电子工程, 2021, 41(3): 102- 106.
|
|
WANG Z X, ZENG L J. Research on performance optimization of vector math library based on FT-M6678. Ship Electronic Engineering, 2021, 41(3): 102- 106.
|
17 |
夏际金, 赵洪立, 李川. TI C66x多核DSP高级软件开发技术. 北京: 清华大学出版社, 2017.
|
|
XIA J J, ZHAO H L, LI C. Advanced software development technology of TI C66x multi-core DSP. Beijing: Tsinghua University Press, 2017.
|
18 |
胡江涛. 面向飞腾DSP的模板匹配算法的实现与优化[D]. 郑州: 郑州大学, 2020.
|
|
HU J T. Implantation and optimization of template matching algorithm for Phytium DSP[D]. Zhengzhou: Zhengzhou University, 2020. (in Chinese)
|
19 |
景德胜, 陈川, 刘婷婷. 基于FT-M6678处理器的嵌入式计算机电源设计及实现. 航空计算技术, 2021, 51(5): 122- 125.
|
|
JING D S, CHEN C, LIU T T. Design and implementation of embedded computer power supply based on FT-M6678. Aeronautical Computing Technique, 2021, 51(5): 122- 125.
|
20 |
CASTELLÓ A, CATALÁN S, IGUAL F D, et al. QR factorization using malleable BLAS on multicore processors[C]//Proceedings of ISC High Performance 2022. Hamburg, Germany: [s, n, ], 2022: 176-189.
|
21 |
YANG L M, FOX A, SANDERS G. Rounding error analysis of mixed precision block householder QR algorithms. SIAM Journal on Scientific Computing, 2021, 43(3): 1723- 1753.
doi: 10.1137/19M1296367
|
22 |
杨永舟, 黄秀琼. 基于HLS的复数矩阵QR分解求逆算法的实现与优化. 电子技术, 2021, 50(7): 74- 78.
|
|
YANG Y Z, HUANG X Q. Realization and optimization of inverse algorithm of complex matrix QR decomposition based on HLS. Electronic Technology, 2021, 50(7): 74- 78.
|
23 |
孙延鹏. QR分解技术在递推系统辨识中的应用[D]. 北京: 北京交通大学, 2008.
|
|
SUN Y P. Application of QR decomposition techniques in recursive system identification[D]. Beijing: Beijing Jiaotong University, 2008. (in Chinese)
|
24 |
DONGARRA J J, DU CROZ J, HAMMARLING S, et al. A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 1990, 16(1): 1- 17.
|
25 |
LAWSON C L, HANSON R J, KINCAID D R, et al. Basic linear algebra subprograms for fortran usage. ACM Transactions on Mathematical Software, 1979, 5(3): 308- 323.
|
26 |
董言治, 娄树理, 刘松涛. TMS320C6000系列DSP系统结构原理与应用教程. 北京: 清华大学出版社, 2014: 193- 195.
|
|
DONG Y Z, LOU S L, LIU S T. Structure principle and application course of TMS320C6000 series DSP system. Beijing: Tsinghua University Press, 2014: 193- 195.
|
27 |
孙昆磊. 国产处理器实现SAR算法[D]. 西安: 西安电子科技大学, 2021.
|
|
SUN K L. Implementation of SAR algorithm by domestic processor[D]. Xi'an: Xidian University, 2021. (in Chinese)
|