1 |
JI S , YANG M , YU K . 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35 (1): 221- 231.
doi: 10.1109/TPAMI.2012.59
|
2 |
TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2015: 4489-4497.
|
3 |
TRAN D, BOURDEV L, FERGUS R, et al. Deep End2End Voxel2Voxel prediction[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Washington D.C., USA: IEEE Press, 2016: 17-24.
|
4 |
|
5 |
TRAN D, WANG H, TORRESANI L, et al. A closer look at spatiotemporal convolutions for action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2018: 6450-6459.
|
6 |
ZHANG X F, WANG J S, ZHU C, et al. AccDNN: an IP-based DNN generator for FPGAs[C]// Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines. Washington D.C., USA: IEEE Press, 2018: 210-210.
|
7 |
GEORGE J K, NEJADRIAHI H, SORGER V J. Towards on-chip optical FFTs for convolutional neural networks[C]//Proceedings of the IEEE International Conference on Rebooting Computing. Washington D.C., USA: IEEE Press, 2017: 1-4.
|
8 |
SUDA N, CHANDRA V, DASIKA G, et al. Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks[C]//Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York, USA: ACM Press, 2016: 16-25.
|
9 |
ZHANG C , SUN G Y , FANG Z M , et al. Caffeine: toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019, 38 (11): 2072- 2085.
doi: 10.1109/TCAD.2017.2785257
|
10 |
MITTAL S . A survey of FPGA-based accelerators for convolutional neural networks. Neural Computing and Applications, 2020, 32 (4): 1109- 1139.
doi: 10.1007/s00521-018-3761-1
|
11 |
HU Y X, LIU Y H, LIU Z Y. A survey on convolutional neural network accelerators: GPU, FPGA and ASIC[C]//Proceedings of the 14th International Conference on Computer Research and Development. Washington D.C., USA: IEEE Press, 2022: 100-107.
|
12 |
曹义魁, 陆忠华, 张鉴, 等. 面向国产加速器的CFD核心算法并行优化. 数据与计算发展前沿, 2021, 3 (4): 93- 103.
|
|
CAO Y K , LU Z H , ZHANG J , et al. Parallel optimization of CFD core algorithms based on domestic processor. Frontiers of Data and Computing, 2021, 3 (4): 93- 103.
|
13 |
NIELSEN M A . Neural networks and deep learning. San Francisco, USA: Determination Press, 2015.
|
14 |
XU R, MA S, GUO Y. Performance analysis of different convolution algorithms in GPU environment[C]//Proceedings of the IEEE International Conference on Networking, Architecture and Storage. Washington D.C., USA: IEEE Press, 2018: 1-10.
|
15 |
SHEVGUNOV T , EFIMOV E , GUSCHINA O . Estimation of a spectral correlation function using a time-smoothing cyclic periodogram and FFT interpolation-2N-FFT algorithm. Sensors (Basel, Switzerland), 2022, 23 (1): 215.
doi: 10.3390/s23010215
|
16 |
童敢, 黄立波. Winograd快速卷积相关研究综述. 计算机科学与探索, 2022, 16 (5): 959- 971.
|
|
TONG G , HUANG L B . A review of research on Winograd fast convolution. Journal of Frontiers of Computer Science & Technology, 2022, 16 (5): 959- 971.
|
17 |
NAKASATO N . A fast GEMM implementation on the cypress GPU. ACM SIGMETRICS Performance Evaluation Review, 2011, 38 (4): 50- 55.
doi: 10.1145/1964218.1964227
|
18 |
武铮, 金旭, 安虹. 申威26010众核处理器上Winograd卷积算法的研究与优化. 计算机研究与发展, 2024, 61 (4): 955- 972.
|
|
WU Z , JIN X , AN H . Research and optimization of Winograd convolution algorithm on Shenwei 26010 multi-core processor. Journal of Computer Research and Development, 2024, 61 (4): 955- 972.
|
19 |
JIA Y Q, SHELHAMER E, DONAHUE J, et al. Caffe: convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM International Conference on Multimedia. New York, USA: ACM Press, 2014: 675-678.
|
20 |
李茂文, 曲国远, 魏大洲, 等. 面向GPU计算平台的神经网络卷积性能优化. 计算机研究与发展, 2022, 59 (6): 1181- 1191.
|
|
LI M W , QU G Y , WEI D Z , et al. Performance optimization of neural network convolution based on GPU platform. Journal of Computer Research and Development, 2022, 59 (6): 1181- 1191.
|
21 |
邬江兴, 祁晓峰, 高彦钊. 异构计算并行编程模型综述. 上海航天(中英文), 2021, 38 (4): 1- 11.
|
|
WU J X , QI X F , GAO Y Z . Overview of heterogeneous computing parallel programming models. Aerospace Shanghai(Chinese and English), 2021, 38 (4): 1- 11.
|
22 |
GOTO K , VAN DE GEIJN R A . Anatomy of high-performance matrix multiplication. ACM Transactions on Mathematical Software, 2008, 34 (3): 1- 25.
|
23 |
王年华, 常兴华, 赵钟, 等. 非结构CFD软件MPI+OpenMP混合并行及超大规模非定常并行计算的应用. 航空学报, 2020, 41 (10): 185- 199.
|
|
WANG N H , CHANG X H , ZHAO Z , et al. Implementation of hybrid MPI+OpenMP parallelization on unstructured CFD solver and its applications in massive unsteady simulations. Acta Aeronautica et Astronautica Sinica, 2020, 41 (10): 185- 199.
|
24 |
田卓, 陈一峯. 神威太湖之光上分子动力学模拟的性能优化. 软件学报, 2021, 32 (9): 2945- 2962.
|
|
TIAN Z , CHEN Y F . Performance optimization of molecular dynamics simulation on Sunway TaihuLight system. Journal of Software, 2021, 32 (9): 2945- 2962.
|
25 |
郭建, 丁继政, 朱晓冉. 嵌入式实时操作系统内核混合代码的自动化验证框架. 软件学报, 2020, 31 (5): 1353- 1373.
|
|
GUO J , DING J Z , ZHU X R . An automated verification framework for mixed code of embedded real-time operating system kernels. Journal of Software, 2020, 31 (5): 1353- 1373.
|