1 |
|
2 |
BANAKAR R, STEINKE S, LEE B S, et al. Scratchpad memory: a design alternative for cache on-chip memory inembedded systems[C]//Proceedings of the 10th International Symposium on Hardware/Software Codesign. Washington D. C., USA: IEEE Press, 2002: 1587-1599.
|
3 |
SATO M, KODAMA Y, TSUJI M, et al. Co-design for A64FX manycore processor and "Fugaku". IEEE Micro, 2022, 42(2): 26- 34.
doi: 10.1109/MM.2021.3136882
|
4 |
WEN H, ZHANG W. Reducing cache leakage energy for hybrid SPM-cache architectures[C]//Proceedings of 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems. New York, USA: ACM Press, 2014: 1-9.
|
5 |
刘霞. GX64-DSP片上标向量便签式存储器设计与实现[D]. 长沙: 国防科学技术大学, 2015.
|
|
LIU X. Design and implementation of GX64-DSP on-chip scalar vector scratch pad memory[D]. Changsha: National University of Defense Technology, 2015. (in Chinese)
|
6 |
WOLPERT D, BERRY C, BELL B, et al. Cores, cache, content, and characterization: IBM's second generation 14-nm product, z15. IEEE Journal of Solid-State Circuits, 2021, 56(1): 98- 111.
doi: 10.1109/JSSC.2020.3030062
|
7 |
|
8 |
PHAM D, ASANO S, BOLLIGER M, et al. The design and implementation of a first-generation CELL processor[C]//Proceedings of IEEE International Solid-State Circuits Conference. Washington D. C., USA: IEEE Press, 2005: 125-136.
|
9 |
|
10 |
胡向东, 柯希明, 尹飞, 等. 高性能众核处理器申威26010. 计算机研究与发展, 2021, 58(6): 1155- 1165.
URL
|
|
HU X D, KE X M, YIN F, et al. Shenwei-26010: a high-performance many-core processor. Journal of Computer Research and Development, 2021, 58(6): 1155- 1165.
URL
|
11 |
高珂, 陈荔城, 范东睿, 等. 多核系统共享内存资源分配和管理研究. 计算机学报, 2015, 38(5): 1020- 1034.
URL
|
|
GAO K, CHEN L C, FAN D R, et al. Shared memory resources allocation and management research on multicore systems. Chinese Journal of Computers, 2015, 38(5): 1020- 1034.
URL
|
12 |
|
13 |
|
14 |
CHOQUETTE J, GANDHI W, GIROUX O, et al. NVIDIA A100 tensor core GPU: performance and innovation. IEEE Micro, 2021, 41(2): 29- 35.
doi: 10.1109/MM.2021.3061394
|
15 |
高剑刚, 胡晋, 龚道永, 等. 神威·太湖之光可靠性及可用性设计与分析. 计算机研究与发展, 2021, 58(12): 2696- 2707.
doi: 10.7544/issn1000-1239.2021.20200967
|
|
GAO J G, HU J, GONG D Y, et al. Design and analysis of reliability and availability on Sunway TaihuLight. Journal of Computer Research and Development, 2021, 58(12): 2696- 2707.
doi: 10.7544/issn1000-1239.2021.20200967
|
16 |
LI F, LIU X, LIU Y, et al. SW_Qsim: a minimize-memory quantum simulator with high-performance on a new Sunway supercomputer[C]//Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis. New York, USA: ACM Press, 2021: 1-13.
|
17 |
MANIKANTAN R, RAJAN K, GOVINDARAJAN R. Probabilistic shared cache management [C]//Proceedings of the 39th Annual International Symposium on Computer Architecture. Washington D. C., USA: IEEE Press, 2012: 428-439.
|
18 |
QURESHI M, PATT Y. Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches[C]//Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. Washington D. C., USA: IEEE Press, 2006: 323-335.
|
19 |
SANCHEZ D, KOZYRAKIS C. Scalable and efficient fine-grained cache partitioning with vantage. IEEE Micro, 2012, 32(3): 26- 37.
doi: 10.1109/MM.2012.19
|
20 |
XIE Y J, LOH G. PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches. Computer Architecture, 2009, 43, 556- 567.
|
21 |
CHEN Y, LI W L, KIM C, et al. Efficient shared cache management through sharing-aware replacement and streaming-aware insertion policy[C]//Proceedings of 2009 IEEE International Symposium on Parallel & Distributed Processing. Washington D. C., USA: IEEE Press, 2009: 157-166.
|
22 |
NATARAJAN R, CHAUDHURI M. Characterizing multi-threaded applications for designing sharing-aware last-level cache replacement policies[C]//Proceedings of 2013 IEEE International Symposium on Workload Characterization. Washington D. C., USA: IEEE Press, 2013: 553-568.
|
23 |
刘士建. 基于CPU-GPU异构架构下Cache优化技术的研究[D]. 北京: 北京工业大学, 2018.
|
|
LIU S J. Research on Cache optimization technology based on CPU-GPU heterogeneous architecture[D]. Beijing: Beijing University of Technology, 2018. (in Chinese)
|
24 |
范清文. 异构多核下Cache替换算法的性能优化研究[D]. 北京: 北京工业大学, 2017.
|
|
FAN Q W. Research on performance optimization of Cache replacement algorithm under heterogeneous multi-core[D]. Beijing: Beijing University of Technology, 2017. (in Chinese)
|
25 |
马虓. 片上异构多核处理器LLC替换策略研究[D]. 长沙: 国防科学技术大学, 2013.
|
|
MA X. Research on LLC replacement strategy of heterogeneous multi-core processor on chip[D]. Changsha: National University of Defense Technology, 2013. (in Chinese)
|
26 |
王子聪, 陈小文, 郭阳. 片上多核处理器Cache访问均衡性研究. 计算机学报, 2019, 42(11): 2403- 2416.
doi: 10.11897/SP.J.1016.2019.02403
|
|
WANG Z C, CHEN X W, GUO Y. Research on cache access equalization in chip multi-processor. Chinese Journal of Computers, 2019, 42(11): 2403- 2416.
doi: 10.11897/SP.J.1016.2019.02403
|
27 |
WANG Z C, CHEN X W, LI C, et al. Fairness-oriented and location-aware NUCA for many-core SoC[C]//Proceedings of the 11th IEEE/ACM International Symposium on Networks-on-Chip. New York, USA: ACM Press, 2017: 1-8.
|
28 |
ZHAO X, LIU Y X, ADILEH A, et al. LA-LLC: inter-core locality-aware last-level cache to exploit many-to-many traffic in GPGPUs. IEEE Computer Architecture Letters, 2017, 16(1): 42- 45.
doi: 10.1109/LCA.2016.2611663
|
29 |
KUMAR A, DAS A, JOSE J. Reducing off-chip miss penalty by exploiting underutilised on-chip router buffers[C]//Proceedings of the 38th IEEE International Conference on Computer Design. Washington D. C., USA: IEEE Press, 2020: 532-546.
|
30 |
ALVAREZ L, VILANOVA L, GONZALEZ M, et al. Hardware-software coherence protocol for the coexistence of caches and local memories[C]//Proceedings of 2012 International Conference for High Performance Computing, Networking, Storage and Analysis. New York, USA: ACM Press, 2012: 1-11.
|
31 |
CARBALLEIRA F G, CARRETERO J, CALDERÓN A, et al. An adaptive cache coherence protocol specification for parallel input/output systems. IEEE Transactions on Parallel & Distributed Systems, 2004, 15(6): 533- 545.
|
32 |
PEI S W, KIM M S, GAUDIOT J, et al. Fusion coherence: scalable cache coherence for heterogeneous kilo-core system. Computer and Information Science, 2014, 58(12): 143- 156.
|
33 |
SÁNCHEZ D, KOZYRAKIS C. SCD: a scalable coherence directory with flexible sharer set encoding. IEEE Computer Society, 2012, 44(5): 1- 12.
|
34 |
何锡明, 马胜, 黄立波, 等. 一种基于自更新的简单高效Cache一致性协议. 计算机研究与发展, 2019, 56(4): 719- 729.
URL
|
|
HE X M, MA S, HUANG L B, et al. A simple and efficient cache coherence protocol based on self-updating. Journal of Computer Research and Development, 2019, 56(4): 719- 729.
URL
|
35 |
MEKKAT V, HOLEY A, YEW P C, et al. Building expressive, area-efficient coherence directories[C]//Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. Washington D. C., USA: IEEE Press, 2013: 237-246.
|
36 |
LIU P, HU Q, HUA X C. Adaptive coherence granularity for multi-socket systems. IEEE Transactions on Computers, 2017, 66(8): 1302- 1312.
doi: 10.1109/TC.2017.2676768
|
37 |
SHIM K S, CHO M H, LIS M, et al. Library cache coherence[C]//Proceedings of IEEE International Conference on Computer. Washington D. C., USA: IEEE Press, 2011: 457-465.
|
38 |
LIS M, SHIM K S, CHO M H, et al. Memory coherence in the age of multicores[C]//Proceedings of the 29th IEEE International Conference on Computer Design. Washington D. C., USA: IEEE Press, 2011: 1-8.
|
39 |
SINGH I, SHRIRAMAN A, FUNG W W L, et al. Cache coherence for GPU architectures[C]//Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture. Washington D. C., USA: IEEE Press, 2013: 332-345.
|
40 |
AGARWAL N, NELLANS D, EBRAHIMI E, et al. Selective GPU caches to eliminate CPU-GPU HW cache coherence[C]//Proceedings of 2016 IEEE International Symposium on High Performance Computer Architecture. Washington D. C., USA: IEEE Press, 2016: 335-348.
|
41 |
POWER J, BASU A, GU J L, et al. Heterogeneous system coherence for integrated CPU-GPU systems[C]//Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. New York, USA: ACM Press, 2013: 457-467.
|
42 |
BASU A, PUTHOOR S, CHE S, et al. Software assisted hardware cache coherence for heterogeneous processors[C]//Proceedings of the 2nd International Symposium on Memory Systems. New York, USA: ACM Press, 2016: 279-288.
|
43 |
刘勇, 刘丽, 何王全. 面向众核多级访存资源的静态数据布局优化模型. 计算机应用与软件, 2011, 28(7): 53- 56.
URL
|
|
LIU Y, LIU L, HE W Q. A static data placement optimisation model oriented towards multi-core hierarchical accessible resources. Computer Applications and Software, 2011, 28(7): 53- 56.
URL
|
44 |
胡志刚, 石金锋, 蒋湘涛. 针对能耗热点的SPM静态分配管理策略. 计算机工程与应用, 2010, 46(3): 58-61, 75.
URL
|
|
HU Z G, SHI J F, JIANG X T. Static allocation management strategy for SPM based on energy hotpot. Computer Engineering and Applications, 2010, 46(3): 58-61, 75.
URL
|
45 |
NGUYEN N, DOMINGUEZ A, BARUA R. Memory allocation for embedded systems with a compile-time-unknown scratch-pad size. ACM Transactions on Embedded Computing Systems, 2021, 8(3): 221- 235.
|
46 |
刘勇, 陆林生, 何王全. 一种简便的栈式片上内存动态管理方法. 计算机工程与科学, 2010, 32(9): 111- 114.
URL
|
|
LIU Y, LU L S, HE W Q. An easy stack-analogy on-chip memory dynamic allocation compilation technique. Computer Engineering & Science, 2010, 32(9): 111- 114.
URL
|
47 |
李嘉欣, 邓宁. 一种基于访问计数的SPM管理策略. 计算机工程, 2013, 39(9): 109- 113.
URL
|
|
LI J X, DENG N. A scratch pad memory management strategy based on access counting. Computer Engineering, 2013, 39(9): 109- 113.
URL
|
48 |
LIN J P, LU J, CAI J A, et al. Efficient heap data management on software managed manycore architectures[C]//Proceedings of the 32nd International Conference on VLSI Design and 18th International Conference on Embedded Systems. Washington D. C., USA: IEEE Press, 2019: 112-125.
|
49 |
UDAYAKUMARAN S, DOMINGUEZ A, BARUA R. Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Transactions on Embedded Computing Systems, 2006, 5(2): 472- 511.
|
50 |
BAI K, SHRIVASTAVA A, KUDCHADKER S. Stack data management for Limited Local Memory multi-core processors[C]//Proceedings of the 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors. Washington D. C., USA: IEEE Press, 2011: 578-587.
|
51 |
BAI K, SHRIVASTAVA A. Heap data management for limited local memory multi-core processors[C]//Proceedings of the 8th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. New York, USA: ACM Press, 2010: 317-326.
|
52 |
BAI K, SHRIVASTAVA A. A software-only scheme for managing heap data on limited local memory multicore processors. ACM Transactions on Embedded Computing Systems, 2013, 13(1): 332- 345.
|
53 |
BAI K, SHRIVASTAVA A. Automatic and efficient heap data management for limited local memory multicore architectures[C]//Proceedings of Design, Automation & Test in Europe Conference & Exhibition. Washington D. C., USA: IEEE Press, 2013: 593-598.
|
54 |
LU J, BAI K, SHRIVASTAVA A. SSDM: smart stack data management for software managed multicores[C]//Proceedings of the 50th Annual Design Automation Conference. New York, USA: ACM Press, 2013: 1-8.
|
55 |
VENKATARAMANI V, CHAN M C, MITRA T. Scratchpad-memory management for multi-threaded applications on many-core architectures. ACM Transactions on Embedded Computing Systems, 2019, 18(1): 1- 28.
|
56 |
TAO X H, PANG J M, XU J L, et al. Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architecture. The Journal of Supercomputing, 2021, 77(12): 14502- 14524.
|
57 |
CHAKRABORTY P, PANDA P R, SEN S. Partitioning and data mapping in reconfigurable cache and scratchpad memory: based architectures. ACM Transactions on Design Automation of Electronic Systems, 2016, 22(1): 12- 23.
|
58 |
李建江, 刘珍珍, 王珏. 基于IBM Cell多核平台的OpenMP数组私有化技术研究. 计算机研究与发展, 2010, 47(8): 1434- 1441.
URL
|
|
LI J J, LIU Z Z, WANG J. Optimizing OpenMP by array privatization on the multi-core platform of IBM cell. Journal of Computer Research and Development, 2010, 47(8): 1434- 1441.
URL
|
59 |
WU M C, LIU Y, CUI H M, et al. Bandwidth-aware loop tiling for DMA-supported scratchpad memory[C]//Proceedings of ACM International Conference on Parallel Architectures and Compilation Techniques. New York, USA: ACM Press, 2020: 97-109.
|
60 |
YU C, BAI Y B, SUN Q X, et al. Improving thread-level parallelism in GPUs through expanding register file to scratchpad memory. ACM Transactions on Architecture and Code Optimization, 2019, 15(4): 235- 243.
|
61 |
ZHOU B, HUANG Y Z, XU J C, et al. Memory latency optimizations for the elementary functions on the Sunway architecture. The Journal of Supercomputing, 2019, 75(7): 3917- 3944.
|
62 |
凌明. Cache/SPM共存架的动态存储布局优化技术研究[D]. 南京: 东南大学, 2011.
|
|
LING M. Research on dynamic storage layout optimization technology of Cache/SPM coexistence shelf [D]. Nanjing: Southeast University, 2011. (in Chinese)
|
63 |
WU L, ZHANG W. Cache-aware SPM allocation algorithms for hybrid SPM-cache architectures[C]//Proceedings of the 16th International Symposium on Quality Electronic Design. Washington D. C., USA: IEEE Press, 2015: 432-441.
|
64 |
凌明, 张阳, 梅晨, 等. 一种面向能耗的可重构片上统一存储架构. 东南大学学报(自然科学版), 2011, 41(6): 1137- 1145.
URL
|
|
LING M, ZHANG Y, MEI C, et al. Energy-oriented reconfigurable on-chip unified memory architecture. Journal of Southeast University (Natural Science Edition), 2011, 41(6): 1137- 1145.
URL
|
65 |
KOMURAVELLI R, SINCLAIR M D, ALSOP J, et al. Stash: have your scratchpad and cache it too[C]//Proceedings of the 42nd Annual International Symposium on Computer Architecture. New York, USA: ACM Press, 2015: 707-719.
|