Crossbar-Aware Mixed Precision Quantization

doi:10.19678/j.issn.1000-3428.0252963

Abstract

Abstract: The Memristive Crossbar Array (MCA) serves as the fundamental hardware component of the Computing-in-Memory (CIM) architecture, enabling matrix operations to be performed with O(1) time complexity. However, due to the limited bit-width of device, existing methods often require configuring a large number of memory cells to represent numerical values, leading to increased hardware resource consumption and making it difficult to achieve both high precision and high energy efficiency. To address this issue, paper proposes a mixed-precision quantization method based on crossbar-aware. This method first employs K-means clustering to optimize output channel rearrangement, enhancing weight distribution consistency within sublayers to reduce quantization error and improve post-quantization model accuracy. Building upon this, sublayers are partitioned according to the physical constraints of the MCA, ensuring the output channel count aligns with parallel processing capacity of the MCA. This reduces the number of dequantization operations and lowers computational complexity. Simultaneously, an array-aware regularization term is introduced, combining the number of MCA required per sublayer with group Lasso regularization. This dynamically induces bit-level sparsity in weights, reducing hardware resource overhead while compressing bit width. Experiments show that the method is able to quantize the network model to an average of 1.3-bit with no more than 0.2% loss in accuracy and a reduction in hardware area overhead of about 74% compared to traditional quantization methods on different neural networks (ResNet/VGG). Compared with existing quantization schemes, the method proposed in this paper achieves a synergistic optimization of accuracy and hardware resources at very low bit-width.

摘要： 忆阻交叉阵列作为存内计算架构的核心硬件载体，可在O(1)时间复杂度内实现矩阵运算。然而，受器件有限位宽的限制，现有方法往往需要配置大量存储单元来表示数值，导致硬件资源消耗增加，高精度与高能效难以兼得。针对这一关键问题，提出一种基于阵列感知的混合精度量化方法。该方法首先结合K-means聚类对输出通道进行重排优化，以提升子层内权重分布的一致性从而降低量化误差，提高量化后模型精度；在此基础上，依据忆阻阵列的物理约束划分子层，使子层的输出通道数与阵列并行处理能力相匹配，减少反量化操作数，降低计算复杂度。同时，引入阵列感知正则化项，将子层所需阵列数量与组Lasso正则化相结合，动态诱导权重的位级稀疏性，在压缩位宽的同时降低硬件资源开销。在不同网络(ResNet/VGG)上的实验结果表明，该方法将网络模型量化至1.3位时精度损失控制在0.2%的同时，降低约74%的硬件面积开销。与现有量化方案相比，所提出的方法在极低位宽下实现了精度与硬件资源的协同优化。

Li Qian, Liu Peng, Yao Lian, Wu Jigang. Crossbar-Aware Mixed Precision Quantization[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252963.

李倩, 刘鹏, 姚廉, 武继刚. 基于阵列感知的混合精度量化方法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252963.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0252963

References

[1] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 770-778.
[2] WANG J, SUN K, TIAN S, et al. Deep High-Resolution Representation Learning for Visual Recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3349-3364.
[3] VOULODIMOS A, DOULAMIS N, DOULAMIS A, et al. Deep learning for computer vision: A brief review[J]. Computational Intelligence and Neuroscience, 2018, 2018: 1-13.
[4] LU W, DUAN Y, SONG Y. Self-Attention-Based Convolutional Neural Networks for Sentence Classification[C]//Proceedings of the 2020 IEEE 6th International Conference on Computer and Communications. Chengdu, China: IEEE, 2020: 2065-2069.
[5] JAIN P, JAIN A, NRUSIMHA A, et al. Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization[C]//Proceedings of Machine Learning and Systems. 2020, 2: 497-511.
[6] LIU Q, WU H, WANG J, et al. 33.2 A Fully Integrated Analog ReRAM Based 78.4TOPS/W Compute-In-Memory Chip with Fully Parallel MAC Computing[C]//2020 IEEE International Solid-State Circuits Conference - Digest of Technical Papers. San Francisco, CA, USA: IEEE, 2020: 500-502.
[7] PONZINA F, RIOS M, ANSALONI G, et al. A Flexible In-Memory Computing Architecture for Heterogeneously Quantized CNNs[C]//2021 IEEE Computer Society Annual Symposium on VLSI. Tampa, FL, USA: IEEE, 2021: 164-169. [8] 刘雨婷. 基于忆阻交叉阵列的卷积神经网络计算研究[D]. 电子科技大学, 2023.
Liu Yuting, Research on Convolutional Neural Nework computation Based on Memristor Crossbar Array[D]. University of Electronic Science and Technology of China, 2023.
[9] Sabri M, Riera M, González A. ReDy: A Novel ReRAM-centric Dynamic Quantization Approach for Energy-efficient CNN Inference[J]. ACM Transactions on Architecture and Code Optimization (TACO), 2024, 21(3): 1–25.
[10] AZAMAT A, ASIM F, KIM J, et al. Automated quantization framework for reducing adc size in reram-based neural network accelerators[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2023, 42(12): 4897-4908.
[11] LI B, QU S, WANG Y. An automated quantization framework for high-utilization rram-based pim[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, 41(3): 583-596.
[12] 陈长林, 骆畅航, 刘森, 等. 忆阻器类脑计算芯片研究现状综述[J]. 国防科技大学学报, 2023, 45(01): 1-14. CHEN Changlin, LUO Changhang, LIU Sen, et al. Review on the memristor based neuromorphic chips[J]. Journal of National University of Defense Technology, 2023, 45(1): 1-14.
[13] ZHU Z, SUN H, XIE T, et al. A Configurable Multi-Precision CNN Computing Framework Based on Single Bit RRAM[C]//2019 56th ACM/IEEE Design Automation Conference. Las Vegas, NV, USA: IEEE, 2019: 1-6.
[14] DONG Z, YAO Z, GHOLAMI A, et al. HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE, 2019: 293-302.
[15] YU J, MAI S, ZHANG P, et al. Mixed-precision post-training quantization for learned image compression[J]. IEEE Internet of Things Journal, 2025, 12(16): 34392-34405.
[16] ZHANG L, HE Y, FEI W, et al. Towards accurate post-training quantization for reparameterized models[J]. Applied Intelligence, 2024, 55: 606.
[17] YANG D, HE N, HU X, et al. Post-training quantization for reparameterization via coarse & fine weight splitting[J]. Journal of Systems Architecture, 2024, 147: 103065.
[18] YANG H R, DUAN L, CHEN Y R, et al. BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization[C]//9th International Conference on Learning Representations. Virtual Event, Austria: OpenReview.net, 2021.
[19] BENGIO Y, LÉONARD N, COURVILLE A. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation[J/OL]. CoRR, 2013, abs/1308.3432.
[20] CAI Y, TANG T, XIA L, et al. Low bit-width convolutional neural network on rram
[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(7): 1414-1427. [21] XIA T, ZHAO B, MA J, et al. An energy-and-area-efficient cnn accelerator for universal powers-of-two quantization[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2023, 70(3): 1242-1255.
[22] WU X, HANSON E, WANG N, et al. Block-wise mixed-precision quantization: Enabling high efficiency for practical reram-based dnn accelerators[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024, 43(12): 4558-4571.
[23] GONG R, LIU X, LI Y, et al. Pushing the Limit of Post-Training Quantization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(7): 5556-5570.
[24] CHEN J, ZHANG Y, LIU S, et al. Adaptive Quantization with Mixed-Precision Based on Low-Cost Proxy[C]//ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing. Rhodes Island, Greece: IEEE, 2024: 6720-6724.
[25] BAI J, SUN S, ZHAO W, et al. Cimq: A hardware-efficient quantization framework for computing-in-memory-based neural network accelerators[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024, 43(1): 189-202.
[26] Q. Qi, Y. Lu, J. Li, et al. Learning Low Resource Consumption CNN Through Pruning and Quantization[J]. IEEE Transactions on Emerging Topics in Computing, 2022, 10(2): 886-903.
[27] HUANG G, LIU Z, PLEISS G, et al. Convolutional Networks with Dense Connectivity[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(12): 8704-8716.
[28] ZHU Z, SUN H, XIE T, et al. Mnsim 2.0: A behavior-level modeling tool for processing-in-memory architectures[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2023, 42(11): 4112-4125.

Please choose a citation manager

Content to export