A Survey of Post-Training Quantification Methods

doi:10.19678/j.issn.1000-3428.0252721

Abstract

Abstract: Post-Training Quantization (PTQ) is an efficient model compression method that converts the parameters of high-precision floating-point models into low-bit integer representations without the need for retraining, using only a small amount (or no) unlabeled calibration data. This method significantly reduces storage and computational overhead while maximizing the retention of the original model's inference accuracy, making it widely recognized and adopted in both academia and industry. This paper systematically summarizes the research progress of PTQ from four dimensions: quantization steps, method classification, tool ecosystem, and application advancements.First, a clear framework for the quantization process is constructed, covering steps such as dynamic range statistics, quantization parameter calculation, weight and activation quantization, error optimization, and model generation. Second, a complete classification system for quantization methods is proposed, which includes quantization granularity, bit width, calibration methods, and structure-guided quantization. Third, the tool ecosystem supporting the large-scale application of PTQ is analyzed, discussing its value in hardware adaptation and engineering deployment. Finally, the paper summarizes the integration and application progress of PTQ methods and highlights the challenges faced in practice, especially those related to cross-modal consistency, extremely low-bit semantic collapse, and hardware adaptation. These practical challenges not only reveal the limitations of current technologies but also provide important directions for future research. This review provides a reference framework for PTQ methods for both academia and industry, facilitating the widespread application of artificial intelligence in resource-constrained scenarios.

摘要： 后训练量化（Post-Training Quantization, PTQ）是一种高效的模型压缩方法，它无需重新训练模型，只需少量（或无需）无标签校准数据即可将高精度浮点模型的参数转换为低比特整数表示。该方法在显著降低存储与计算开销的同时，能够最大限度地保留原始模型的推理精度，因而受到学术界与工业界的广泛关注。本文从PTQ的量化步骤、方法分类、工具生态和应用进展四个维度，系统总结了PTQ的研究进展。首先，构建了清晰的量化流程框架，涵盖动态范围统计、量化参数计算、权重与激活量化、误差优化和模型生成等步骤；其次，提出了一个完整的量化方法分类体系，从量化粒度、位宽、校准方法到结构导向量化；再次，分析了支持PTQ规模化应用的工具生态，探讨了其在硬件适配和工程部署中的应用价值；最后，总结了PTQ方法的融合与应用进展，并指出PTQ方法在实践中面临的挑战，尤其是跨模态一致性、极低比特语义崩塌与硬件适配等难题。这些实践挑战的总结不仅揭示了当前技术的局限性，也为未来研究提供了重要方向。本综述为学术界与工业界提供了PTQ方法的参考框架，助力推动人工智能在资源受限场景中的广泛应用。

ZHANG Junna, WANG Hongzun, DING Chuntao. A Survey of Post-Training Quantification Methods[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252721.

张俊娜, 王泓尊, 丁春涛. 后训练量化方法综述[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252721.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0252721

References

[1] LE SCAO T, FAN A, AKIKI C, et al. BLOOM: a 176B-parameter open-access multilingual language model[EB/OL]. (2022-11-09) [2025-06-07]. https://arxiv.org/abs/2211.05100.
[2] GALLIFANT J, FISKE A, LEVITES STREKALOVA Y A, et al. Peer review of GPT-4 technical report and systems card[J]. Plos Digital Health, 2024, 3(1): e0000417.
[3] KARUMBUNATHAN L S. Nvidia jetson agx orin series[EB/OL]. [2025-06-07]. https://www.nvidia.cn/content/dam/en-zz/Solutions/gtcf 21/jetson-orin/nvidia-jetson-agx-orin-technical-brief.pdf.
[4] 杨春,张睿尧,黄泷,等.深度神经网络模型量化方法综述[J].工程科学学报, 2023, 45(10): 1613-1629. YANG C, ZHANG R Y, HUANG L, et al. A survey of quantization methods for deep neural network models[J]. Journal of Engineering Science, 2023, 45(10): 1613-1629.
[5] CHEN M Z, SHAO W Q, XU P, et al. EfficientQAT: efficient quantization-aware training for large language models[EB/OL]. (2024-07-10)[2025-06-07]. https://arxiv.org/abs/2407.11062.
[6] HASAN J. Optimizing large language models through quantization: a comparative analysis of PTQ and QAT techniques[EB/OL]. (2024-11-09)[2025-06-07]. https://arxiv.org/abs/2411.06084.
[7] KERKOURI M A, TLIBA M, CHETOUANI A, et al. Quantization effects on neural networks perception: how would quantization change the perceptual field of vision models?[C]//Proceedings of the IEEE Thirteenth International Conference on Image Processing Theory, Tools and Applications. Washington D. C., USA: IEEE Press, 2024: 1-6.
[8] JACOB B, KLIGYS S, CHEN B, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference[C] //Proceedings of the IEEE conference on computer vision and pattern recognition. Washington D. C., USA: IEEE Press, 2018: 2704-2713.
[9] COURBARIAUX M, HUBARA I, SOUDRY D, et al. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1[EB/OL].(2016-07-9)[2025-09-01]. https://arxiv.org/abs/1602.02830.
[10] 咸聪慧,王天一,李超,等.基于量化的深度神经网络优化研究综述[J]. 山东师范大学学报(自然科学版), 2024, 39(01): 21-32. XIAN C H, WANG T Y, LI C, et al. A survey of optimization research on deep neural networks based on quantization[J]. Journal of Shandong Normal University (Natural Science Edition), 2024, 39(01): 21-32.
[11] CHENG Y, WANG D, ZHOU P, et al. A survey of model compression and acceleration for deep neural networks[EB/OL]. (2020-06-30)[2025-09-01]. https://arxiv.org/abs/1710.09282v1.
[12] KIM S, HOOPER C, WATTANAWONG T, et al. Full stack optimization of transformer inference:a survey[EB/OL]. (2023-07-27) [2025-09-01]. https://arxiv.org/abs/2302.14017. [13] ZHU X, LI J, LIU Y, et al. A survey on model compression for large language models[J]. Transactions of the Association for Computational Linguistics, 2024, 12: 1556-1577.
[14] QIN H, GONG R, LIU X, et al. Binary neural networks: A survey[J]. Pattern Recognition, 2020, 105: 107281.
[15] ZHANG Z, GAO Y C, FAN J, et al. SelectQ: calibration data selection for post-training quantization[J]. Machine Intelligence Research, 2025, 22(3): 499-510.
[16] HUBARA I, NAHSHAN Y, HANANI Y, et al. Improving post training neural quantization: layer-wise calibration and integer programming [EB/OL]. (2020-06-14)[2025-06-07]. https://arxiv.org/abs/2006.10518.
[17] GONG R, LIU X L, JIANG S H, et al. Differentiable soft quantization: Bridging full-precision and low-bit neural networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2019: 4852-4861.
[18] CHICCO D, WARRENS M J, JURMAN G. The coefficient of determination r-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation[J]. PeerJ. Computer Science, 2021, 7: e623-e623.
[19] 郭秋丹,濮约刚,张启军,等.基于舍入误差的神经网络量化方法[J].计算机工程与设计, 2024, 45(08): 2534-2539. GUO Q D, PU Y G, ZHANG Q J, et al. Neural network quantization method based on rounding errors[J]. Computer Engineering and Design, 2024, 45(08): 2534-2539.
[20] KYURKCHIEV N, MARKOV S. Sigmoid functions: some approximation and modelling aspects[J]. LAP LAMBERT Academic Publishing, 2015, 4: 34.
[21]  LIU L Y, JIANG H M, HE P C, et al. On the variance of the adaptive learning rate and beyond[C]//Proceedings of the 8th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2020: 1-14.
[22] WU D, TANG Q, ZHAO Y, et al. EasyQuant: post-training quantization via scale optimization[EB/OL]. [2025-06-07]. https://arxiv.org/abs/2006.16669. (2020-06-30)
[23] DING Y F, FENG W L, CHEN C Y, et al. Reg-PTQ: regression-specialized post-training quantization for fully quantized object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2024: 16174-16184.
[24] WANG Z W, WU Z Y, LU J W, et al. BiDet: an efficient binarized object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 2049-2058.
[25] CHEN P, LIU J, ZHUANG B H, et al. AQD: towards accurate quantized object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2021: 104-113.
[26] NIU L, LIU J W, YUAN Z H, et al. Improving post-training quantization on object detection with task loss-guided Lp metric [EB/OL].(2023-05-07)[2025-06-07].https://arxiv.org/abs/2304.09785.
[27] SO J, LEE J, AHN D, et al. Temporal dynamic quantization for diffusion models[J]. Advances in Neural Information Processing Systems, 2023, 36: 48686-48698.
[28] WANG C Y, WANG Z W, XU X W, et al. Towards accurate post-training quantization for diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2024: 16026-16035.
[29] LIU X C, YE M, ZHOU D Y, et al. Post-training quantization with multiple points: mixed precision without mixed precision[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Washington D. C., USA: IEEE Press, 2021, 35(10): 8697-8705.
[30] HE Y F, LIU J, WU W J, et al. EfficientDM: efficient quantization-aware fine-tuning of low-bit diffusion models[C]// Proceedings of the Twelfth International Conference on LearningRepresentations . Washington D. C., USA: IEEE Press, 2024: 1-20.
[31] HUBARA I, NAHSHAN Y, HANANI Y, et al. Accurate post training quantization with small calibration sets[C]//Proceedings of the International Conference on Machine Learning. New York, USA: ACM Press, 2021: 4466-4475.
[32] WEI X Y, ZHANG Y C, ZHANG X G, et al. Outlier suppression: pushing the limit of low-bit transformer language models[J]. Advances in Neural Information Processing Systems, 2022, 35: 17402-17414.
[33] WEI X Y, ZHANG Y C, LI Y H, et al. Outlier suppression+: accurate quantization of large language models by equivalent and optimal shifting and scaling[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudburg: ACL Press, 2023: 1648-1665.
[34] KIM N J, LEE J, KIM H. HyQ: hardware-friendly post-training quantization for CNN-transformer hybrid networks[C]//Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. Jeju Island, Republic of Korea:[s. n.], 2024: 4291-4299.
[35] SHANG Y Z, LIU G W, KOMPELLA R R, et al. CL-Calib: enhancing post-training quantization calibration through contrastive learning[C] //Proceedings of the International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-11.
[36] LI X Y, LIU Y J, LIAN L, et al. Q-Diffusion: quantizing diffusion models[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2023: 17535-17545.
[37] SHANG Y Z, YUAN Z H, XIE B, et al. Post-training quantization on diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2023: 1972-1981.
[38] TANG S, WANG X, CHEN H, et al. Post-training quantization with progressive calibration and activation relaxing for text-to-image diffusion models[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024: 404-420.
[39] YVINEC E, DAPOGNY A, BAILLY K. Gradient-based post-training quantization: challenging the status quo[EB/OL]. [2025-06-7]. https:// arxiv.org/abs/2308.07662.
[40] JIANG Y F, SUN N, XIE X, et al. ADFQ-ViT: activation-distribution- friendly post-training quantization for vision transformers[J]. Neural Networks, 2025, 186: 107289.
[41] LI Z K, XIAO J R, YANG L et al. RepQ-ViT: scale reparameterization for post-training quantization of vision transformers [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2023: 17227-17236.
[42] OH S, SIM H, KIM J, et al. Non-uniform step size quantization for accurate post-training quantization[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 658-673.
[43] MOON J, KIM D, CHEON J, et al. Instance-aware group quantization for vision transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2024: 16132-16141.
[44] RANJAN N, SAVAKIS A. LRP-QViT: mixed-precision vision transformer quantization via layer-wise relevance propagation[EB/OL]. [2025-06-07]. https://arxiv.org/abs/2401.11243.
[45] LEE J, KWON Y, PARK S, et al. Q-HyViT: post-Training quantization of hybrid vision transformers With Bridge Block Reconstruction for IoT Systems[J]. IEEE Internet of Things Journal, 2024, 11(22): 36384- 36396.
[46] YANG D W, HE N, HU X, et al. Post-training quantization for re-parameterization via coarse & fine weight splitting[J]. Journal of Systems Architecture, 2024, 147: 103065.
[47] RYU H, LIM S, SHIM H. Memory-efficient fine-tuning for quantized diffusion model[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024: 356-372.
[48] WANG H X, SHANG Y Z, YUAN Z H, et al. QuEST: low-bit diffusion model quantization via efficient selective finetuning[EB/OL]. [2025-06-07]. https://arxiv.org/abs/2402.03666.
[49] LIU X W, LI Z K, XIAO J R, et al. EDA-DM: enhanced distribution alignment for post-training quantization of diffusion models [EB/OL]. (2024-01-25)[2025-06-07]. https://arxiv.org/abs/2401.04585.
[50] LI Y H, GONG R H, TAN X, et al. BRECQ: pushing the limit of post-training quantization by block reconstruction[C]//Proceedings of the International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2021: 1-16.
[51] YAO H Y, LI P, CAO J, et al. RAPQ: rescuing accuracy for power-of-two low-bit post-training quantization[C]//Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence. Vienna, Austria: [s. n.], 2022: 1573-1579.
[52] SHOMRON G, GABBAY F, KURZUM S, et al. Post-training sparsity-aware quantization[J]. Advances in Neural Information Processing Systems, 2021, 34: 17737-17748.
[53] WANG C B, ZHENG D D, LIU Y L, et al. Leveraging inter-layer dependency for post-training quantization[J]. Advances in Neural Information Processing Systems, 2022, 35: 6666-6679.
[54] JEON Y, LEE C, CHO E, et al. Mr.BiQ: post-training non-uniform quantization based on minimizing the reconstruction error[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2022: 12329-12338.
[55] BAI S P, CHEN J, SHEN X T, et al. Unified data-free compression: pruning and quantization without fine-tuning[C]//Proceedings of the IEEE/CVF International Conference on Compute
r Vision. Washington D. C., USA: IEEE Press, 2023: 5876-5885. [56] LI Y H, PANDA P. TesseraQ: ultra low-bit LLM post-training quantization with block reconstruction[EB/OL]. (2024-10-24) [2025-06-07]. https://arxiv.org/abs/2410.19103.
[57] YIN J J, DONG J H, WANG Y H, et al. ModuLoRA: finetuning 2-bit LLMs on consumer GPUs by integrating with modular quantizers[C]// Proceedings of the 2024 Transactions on Machine Learning Research. Cham, Switzerland: Springer Nature Switzerland, 2024: 1-17.
[58] XU K, LI Z C, WANG S Y, et al. PTMQ: post-training multi-bit quantization of neural networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Washington D. C., USA: IEEE Press, 2024, 38(14): 16193-16201.
[59] DETTMERS T, LEWIS M, SHLEIFER S, et al. 8-bit optimizers via block-wise quantization[C]//Proceedings of the International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2022: 1-19.
[60] FRANTAR E, ALISTARH D. Optimal brain compression: a framework for accurate post-training quantization and pruning[J]. Advances in Neural Information Processing Systems, 2022, 35: 4475- 4488.
[61] Nagel M, Amjad R A, Van Baalen M, et al. Up or down? adaptive rounding for post-training quantization[C]//International conference onmachine learning. New York, USA: ACM Press, 2020: 7197-7206.
[62] 田程,李正杰,陈功富,等.深度神经网络低比特量化方法综述[J].现代信息科技,2025,9(10):23-33+38. TIAN C, LI Z J, CHEN G F, et al. A survey of low-bit quantization methods for deep neural networks[J].Modern Information Technology, 2025, 9(10): 23-33+38.
[63] HE Y F, LIU L P, LIU J, et al. PTQD: accurate post-training quantization for diffusion models[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2023: 13237-13249.
[64] BHALGAT Y, LEE J, NAGEL M, et al. LSQ+: improving low-bit quantization through learnable offsets and better initialization[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Washington D. C., USA: IEEE Press, 2020: 696-697.
[65] LEE J H, KIM J, KWON S J, et al. FlexRound: learnable rounding based on element-wise division for post-training quantization[C]// Proceedings of the International Conference on Machine Learning. New York, USA: ACM Press, 2023: 18913-18939.
[66] KIM H B, LEE J H, YOO S, et al. MetaMix: meta-state precision searcher for mixed-precision activation quantization[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Washington D. C., USA: IEEE Press, 2024, 38(12): 13132-13141.
[67] ZHOU S F, LI L, ZHANG X Y, et al. LiDAR-PTQ: post-training quantization for point cloud 3D object detection[C]//Proceedings of the Twelfth International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-15.
[68] LIN J, TANG J M, TANG H T, et al. AWQ: activation-aware weight quantization for on-device LLM compression and acceleration[J]. Proceedings of Machine Learning and Systems, 2024, 6: 87-100.
[69] PAN J Y, WANG C C, ZHENG K F, et al. SmoothQuant+: accurate and efficient 4-bit post-training weight quantization for LLM[EB/OL]. (2023-12-01)[2025-06-08]. https://arxiv.org/abs/2312.03788.
[70] WANG P Q, WANG D S, JI Y, et al. QGAN: quantized generative adversarial networks[EB/OL]. (2019-01-23)[2025-06-08]. https://arxiv. org/abs/1901.08263.
[71] MA Y X, LI H X, ZHENG X W, et al. Solving oscillation problem in post-training quantization through a theoretical perspective[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2023: 7950- 7959.
[72] WEI X Y, GONG R H, LI Y H, et al. QDrop: randomly dropping quantization for extremely low-bit post-training quantization[C]// Proceedings of the International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2022: 1-19.
[73] LIN Y, ZHANG T Y, SUN P Q, et al. FQ-ViT: post-training quantization for fully quantized vision transformer[C]//Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence. Vienna, Austria: [s. n.], 2022: 1173-1179.
[74] LV C T, CHEN H, GUO J Y, et al. PTQ4SAN: post-training quantization for segment anything[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2024: 15941-15951.
[75] LIU Y J, YANG H R, DONG Z, et al. NoisyQuant: noisy bias-enhanced post-training activation quantization for vision transformers[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2023: 20321-20330.
[76] CHEE J, CAI Y, KULESHOV V, et al. QuIp: 2-bit quantization of large language models with guarantees[J]. Advances in Neural Information Processing Systems, 2023, 36: 4396-4429.
[77] ADEPU H, ZENG Z P, ZHANG L, et al. FrameQuant: flexible low-bit quantization for transformers[C]//Proceedings of the 41st International Conference on Machine Learning. New York, USA: ACM Press, 2024: 203-227.
[78] YUAN Z H, SHANG Y Z, DONG Z. PB-LLM: partially binarized large language models[C]//Proceedings of the Twelfth International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-14.
[79] LIN C, PENG B, LI Z Y, et al. Bit-Shrinking: limiting instantaneous sharpness for improving post-training quantization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2023: 16196- 16205.
[80] WANG M Z, SUN H X, SHI J, et al. Q-YOLO: efficient inference for real-time object detection[C]//Proceedings of the Asian Conference on Pattern Recognition. Cham: Springer Nature Switzerland, 2023: 307- 321.
[81] PHAM C, HOANG A D, NGUYEN C C, et al. MetaAug: meta-data augmentation for post-training quantization[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024: 236- 252.
[82] ZHANG X G, QIN H T, DING Y F, et al. Diversifying sample generation for accurate data-free quantization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern recognition. Washington D. C., USA: IEEE Press, 2021: 15658-15667.
[83] GUO C, QIU Y X, LENG J W, et al. SQuant: on-the-fly data-free quantization via diagonal hessian approximation[C]//Proceedings of the International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2022: 1-18.
[84] YAO Z W, WU X X, LI C, et al. Exploring post-training quantization in LLMs from comprehensive study to low rank compensation[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Washington D. C., USA: IEEE Press, 2024, 38(17): 19377-19385.
[85] DETTMERS T, SVIRSCHEVSKI R A, EGIAZARIAN V, et al. SPQR: a sparse-quantized representation for near-lossless LLM weight compression[C]//Proceedings of the Twelfth International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-29.
[86] CAI Y H, YAO Z W, DONG Z, et al. ZeroQ: a novel zero shot quantization framework[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 13169-13178.
[87] FAN C X, WANG Z Q, GUO D, et al. Data-free quantization via pseudo-label filtering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2024: 5589-5598.
[88] JEON Y, LEE C, KIM H. GENIE: show me the data for quantization[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2023: 12064-12073.
[89] MASOURIS A, SHARMA M, BOGUSZEWSKI A, et al. Post-training model quantization using GANs for synthetic data generation[EB/OL]. (2023-05-20)[2025-06-08]. https://arxiv.org/abs/2305.06052.
[90] ANDREEV P, FRITZLER A. Quantization of generative adversarial networks for efficient inference: a methodological study[C]// Proceedings of the 26th International Conference on Pattern Recognition. Washington D. C., USA: IEEE Press, 2022: 2179-2185.
[91] LI Z K, MA L P, CHEN M J, et al. Patch similarity aware data-free quantization for vision transformers[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 154-170.
[92] RAMACHANDRAN A, KUNDU S, KRISHNA T. CLAMP-ViT: contrastive data-free learning for adaptive post-training quantization of vits[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024: 307-325.
[93] LI J W, ZHANG T C, YEN I E H, et al. FP8-BERT: post-training quantization for transformer[EB/OL]. [2025-06-08]. https://arxiv.org/ abs/2312.05725.
[94] CAO J L, CHOLAKKAL H, ANWER R M, et al. D2Det: towards high quality object detection and instance segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 11485-11494.
[95] 徐增敏,陈凯,郭威伟,等.面向轻量级卷积网络的激活函数与压缩模型[J].计算机工程, 2022, 48(05): 1000-3428. XU Z M, CHEN K, GUO W W, et al. Activation functions and compressed models for lightweight convolutional networks. Computer Engineering, 2022, 48(05): 1000-3428.
[96] 史宝岱,张秦,李瑶,等.面向图像目标识别的轻量化卷积神经网络[J]. 计算机工程, 2022, 48(06): 1000-3428. SHI B D, ZHANG Q, LI Y, et al. Lightweight convolutional neural networks for image object recognition. Computer Engineering, 2022, 48(06): 1000-3428.
[97] MAKHOV D, OSTAPETS R, ZHELAVSKAYA I, et al. Towards robust full low-bit quantization of super resolution networks[C]// Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024: 182-198.
[98] TU Z J, HU J, CHEN H T, et al. Toward accurate post-training quantization for image super resolution[C]//Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2023: 5856-5865.
[99] TANG C, MENG Y, JIANG J, et al. Retraining-free model quantization via one-shot weight-coupling learning[C]//Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2024: 15855-15865.
[100] NAGEL M, BAALEN M, BLANKEVOORT T, et al. Data-free quantization through weight equalization and bias correction[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2019: 1325-1334.
[101] DUNG H A, PHAM C, LE T, et al. Sharpness-aware data generation for zero-shot quantization[C]//Proceedings of the International Conference on Machine Learning. New York, USA: ACM Press, 2024: 12034-12045.
[102] LIU J, GONG R H, WEI X Y, et al. QLLM: accurate and efficient low-bitwidth quantization for large language models[C]// Proceedings of the Twelfth International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-23.
[103] XIAO G X, LIN J, SEZNEC M, et al. SmoothQuant: accurate and efficient post-training quantization for large language models[C]// Proceedings of the International Conference on Machine Learning. New York, USA: ACM Press, 2023: 38087-38099.
[104] YUAN Z H, NIU L, LIU J , et al. RPTQ: reorder-based post-training quantization for large language models[EB/OL]. [2025-06-08]. https:// arxiv.org/abs/2304.01089.
[105] LEE C, JIN J Y, KIM T, et al. OWQ: outlier-aware weight quantization for efficient fine-tuning and inference of large language models[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Washington D. C., USA: IEEE Press, 2024, 38(12): 13355-13364.
[106] KIM S, HOOPER C R C, GHOLAMI A, et al. SqueezeLLM: dense- and-sparse quantization[C]//Proceedings of the International Conference on Machine Learning. New York, USA: ACM Press, 2024: 23901-23923.
[107] KIM Y J, HENRY R, FAHIM R, et al. FineQuant: unlocking efficiency with fine-grained weight-only quantization for LLMs[EB/OL]. [2025-06-08]. https://arxiv.org/abs/2308.09723.
[108] YAO Z W, WU X X, LI C, et al. Exploring post-training quantization in LLMs from comprehensive study to low rank compensation[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Washington D. C., USA: IEEE Press, 2024, 38(17): 19377-19385.
[109] DING X, LIU X Y, TU Z J, et al. CBQ: cross-block quantization for large language models[C]//Proceedings of the International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2025: 1-20.
[110] YAO Z W, YAZDANI AMINABADI R, ZHANG M J, et al. ZeroQuant: efficient and affordable post-training quantization for large-scale transformers[J]. Advances in Neural Information Processing Systems, 2022, 35: 27168-27183.
[111] ZAFRIR O, BOUDOUKH G, IZSAK P, et al. Q8bert: Quantized 8bit bert[C]//2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition. New York, USA: ACM Press, 2019: 36-39.
[112] SHEN S, DONG Z, YE J, et al. Q-bert: Hessian based ultra low precision quantization of bert[C]//Proceedings of the AAAI conference on artificial intelligence. Washington D. C., USA: IEEE Press, 2020, 34(05): 8815-8821.
[113] LIU Z, ZHAO C, FEDOROV I, et al. SpinQuant: LLM Quantization with Learned Rotations[C]//The Thirteenth International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-24.
[114] LIN Y, TANG H, YANG S, et al. QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving[C]//Eighth Conference on Machine Learning and Systems. Washington D. C., USA: IEEE Press, 2024: 1-28.
[115] LIU Z H, WANG Y H, HAN K, et al. Post-training quantization for vision transformer[J]. Advances in Neural Information Processing Systems, 2021, 34: 28092-28103.
[116] WU Z G, CHEN J X, ZHONG H W, et al. AdaLog: post-training quantization for vision transformers with adaptive logarithm quantizer[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024: 411-427.
[117] YUAN Z H, XUE C H, CHEN Y Q, et al. PTQ4ViT: post-training quantization for vision transformers with twin uniform quantization[C] //Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 191-207.
[118] LIU X Y, DING X, YU L, et al. PQ-SAM: post-training quantization for segment anything model[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland,2024: 420-437.
[119] ZHONG Y S, HU J W, HUANG Y, et al. ERQ: error reduction for post-training quantization of vision transformers[C]//Proceedings of the International Conference on Machine Learning. New York, USA: ACM Press, 2024: 61664-61680.
[120] RANJAN N, SAVAKIS A. Mix-QViT: mixed-precision vision transformer quantization driven by layer importance and quantization sensitivity[EB/OL]. (2025-01-15)[2025-06-08]. https://arxiv.org/abs/25 01.06357.
[121] YAO Y Z, TIAN F, CHEN J, et al. Timestep-aware correction for quantized diffusion models[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024: 215-232.
[122] YANG Y W, DAI X L, WANG J L, et al. Efficient quantization strategies for latent diffusion models[EB/OL]. [2025-06-08]. https:// arxiv.org/abs/2312.05431.
[123] ZHAO T C, NING X F, FANG T C, et al. MixDQ: memory-efficient few-step text-to-image diffusion models with metric-decoupled mixed precision quantization[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024: 285-302.
[124] PARK G, KIM M, LEE S, et al. LUT-GEMM: quantized matrix multiplication based on LUTs for efficient inference in large-scale generative language models[C]//Proceedings of the Twelfth International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-18.
[125] DETTMERS T, LEWIS M, BELKADA Y, et al. Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale[J]. Advances in Neural Information Processing Systems, 2022, 35: 30318-30332.
[126] HOOPER C, KIM S, MOHAMMADZADEH H, et al. KVQuant: towards 10 million context length LLM inference with KV cache quantization[J]. Advances in Neural Information Processing Systems, 2024, 37: 1270-1303.
[127] YUE Y X, YUAN Z H, DUANMU H, et al. WKVQuant: quantizing weight and key/value cache for large language models gains more[EB/OL]. [2025-06-08]. https://arxiv.org/abs/2402.12065.
[128] GUO C, TANG J M, HU W M, et al. OliVe: accelerating large language models via hardware-friendly outlier-victim pair quantization [C]//Proceedings of the 50th Annual International Symposium on Computer Architecture. New York, USA: ACM Press, 2023: 1-15.
[129] GUO Y P, LANG Y L, REN Q Y. GPTQT: quantize large language models twice to push the efficiency[C]//Proceedings of the IEEE International Conference on Cybernetics and Intelligent Systems and IEEE International Conference on Robotics, Automation and Mechatronics. Washington D. C., USA: IEEE Press, 2024: 368-373.
[130] BAI H L, HOU L, SHANG L F, et al. Towards efficient post-training quantization of pre-trained language models[J]. Advances in Neural Information Processing Systems, 2022, 35: 1405-1418.
[131] MA Y X, LI H X, ZHENG X W, et al. Outlier-aware slicing for post- training quantization in vision transformer[C]//Proceedings of the 41st International Conference on Machine Learning. New York, USA: ACM Press, 2024: 33811-33825.
[132] LIU J, NIU L, YUAN Z, et al. Pd-quant: Post-training quantization based on prediction difference metric[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2023: 24427-24437.
[133] SHAO W, CHEN M, ZHANG Z, et al. OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models[C]//The Twelfth International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-25.
[134] JØRGENSEN T E. Resource-efficient language models: quantization for fast and accessible inference[EB/OL]. (2025-05-15)[2025-06-08]. https://arxiv.org/abs/2505.08620.
[135] INTEL. Neural compressor[EB/OL]. (2023-10-25)[2025-06-07]. https://github.com/intel/neural-compressor
[136] CHENG W H, ZHANG W W, SHEN H H, et al. Optimize weight rounding via signed gradient descent for the quantization of LLMs [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudburg: ACL Press, 2024: 11332- 11350.
[137] FRANTAR E, ASHKBOOS S, HOEFLER T, et al. GPTQ: Accurate post-training quantization for generative pre-trained transformers[C] //Proceedings of the Eleventh International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2023: 1-16.
[138] HUGGING FACE. Optimum[EB/OL]. [2025-06-07]. https://github. com/huggingface/optimum.
[139] SHEN Y L, SONG K T, TAN X, et al. Hugginggpt: solving ai tasks with chatgpt and its friends in hugging face[J]. Advances in Neural Information Processing Systems, 2023, 36: 38154-38180.
[140] TARAGHI M, DORCELUS G, FOUNDJEM A, et al. Deep learning model reuse in the huggingface community: challenges, benefit and trends[C]//Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering. Washington D. C., USA: IEEE Press, 2024: 512-523.
[141] PADDLEPADDLE. PaddleSlim[EB/OL]. (2024-01-03)[2025-06-07]. https://github.com/PaddlePaddle/PaddleSlim.
[142] PADDLEPADDLE. PaddleSlim: Model automatic compression tool (ACT) user guide[EB/OL]. [2025-06-07]. https://www.paddlepaddle. org.cn/documentation/docs/zh/guides/infer/paddleslim/paddle_slim_cn. html.
[143] PYTORCH. PyTorch native architecture optimization: torchao [EB/OL]. (2024-08-08)[2025-06-07]. https://github.com/pytorch/ao.
[144] PYTORCH. PyTorch quantization support documentation[EB/OL]. [2025-06-07]. https://docs.pytorch.org/docs/stable/quantization-suppor t.html.s
[145] HUAWEI. MindSpore official english documentation[EB/OL]. [2025-06-07]. https://www.mindspore.cn/en.
[146] HUAWEI. MindSpore[EB/OL]. [2025-06-07]. https://github.com/m indspore-ai/mindspore.
[147] TONG Z H, DU N, SONG X B, et al. Study on MindSpore deep learning framework[C]//Proceedings of the 17th International Conference on Computational Intelligence and Security. Washington D. C., USA: IEEE Press, 2021: 183-186.
[148] GOOGLE. QKeras: a quantization deep learning library for Tensorflow Keras[EB/OL]. (2021-02-19)[2025-06-07]. https://github. com/google/qkeras.
[149] LORO F, PAU D, TOMASELLI V. A QKeras neural network zoo for deeply quantized imaging[C]//Proceedings of the IEEE 6th International Forum on Research and Technology for Society and Industry. Washington D. C., USA: IEEE Press, 2021: 165-170.
[150] TENSORFLOW. Post-training quantization guide[EB/OL]. (2022-08-03)[2025-06-07]. https://www.tensorflow.org/model_optimiz ation/guide/quantization/post_training?hl=zh-cn.
[151] PANG B, NIJKAMP E, WU Y N. Deep learning with Tensorflow: a review[J]. Journal of Educational and Behavioral Statistics, 2020, 45(2): 227-248.[152] ALIBABA. MNN: a blazing fast, lightweight deep learning framework[EB/OL]. [2025-06-07]. https://github.com/alibaba/MNN.
[153] JIANG X T, WANG H, CHEN Y L, et al. MNN: a universal and efficient inference engine[J]. Proceedings of Machine Learning and Systems, 2020, 2: 1-13.
[154] TENCENT. NCNN: a high-performance neural network inference framework optimized for mobile platforms[EB/OL]. [2025-06-07]. https://github.com/Tencent/ncnn.
[155] YU Y, YIN Q, ZHANG J, et al. ADMN: agent-driven modular network for dynamic parameter sharing in cooperative multi-agent reinforcement learning[C]//Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. Jeju Island, Republic of Korea: [s. n.], 2024: 302-310.
[156] NVIDIA. NVIDIA TensorRT official documentation[EB/OL]. [2025-06-07]. https://docs.nvidia.com/deeplearning/tensorrt/.
[157] NVIDIA. NVIDIA TensorRT GitHub repository[EB/OL]. [2025-06-07]. https://github.com/NVIDIA/TensorRT.
[158] ZHOU Y, GUO Z, DONG Z, et al. TensorRT implementations of model quantization on edge SoC[C]//Proceedings of the IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip. Washington D. C., USA: IEEE Press, 2023: 486-493.
[159] MICROSOFT. Quantize ONNX models[EB/OL]. [2025-06-07]. https://onnxruntime.ai/docs/performance/model-optimizations/quantiza tion.html.
[160] MICROSOFT. ONNX runtime GitHub repository[EB/OL]. [2025-06-07]. https://github.com/microsoft/onnxruntime.
[161] INTEL. OpenVINO GitHub repository[EB/OL]. [2025-06-07]. https://github.com/openvinotoolkit/openvino.
[162] INTEL. OpenVINO™ documentation[EB/OL]. [2025-06-07]. https:// docs.openvino.ai/.
[163] DEMIDOVSKIJ A, GORBACHEV Y, FEDOROV M, et al. OpenVINO deep learning workbench: comprehensive analysis and tuning of neural networks inference[C]//Proceedings of the International Conference on Computer Vision Workshop. Washington D. C., USA: IEEE Press, 2019: 783-787.
[164] LIU Z, OGUZ B, ZHAO C, et al. LLM-QAT: Data-Free Quantization Aware Training for Large Language Models[C]//Findings of the Association for Computational Linguistics ACL 2024. Stroudburg: ACL Press, 2024: 467-484.
[165] QU X, APONTE D, BANBURY C, et al. Automatic joint structured pruning and quantization for efficient neural network training and compression[C]//Proceedings of the Computer Vision and Pattern Recognition Conference. Washington D. C., USA: IEEE Press, 2025: 15234-15244.
[166] GAO T, GUO L, ZHAO S, et al. QuantNAS: quantization-aware neural architecture search for efficient deployment on mobile device[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2024: 1704-1713.
[167] XIA M, GAO T, ZENG Z, et al. Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning[C]//The Twelfth International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-25.
[168] JUNG S, SON C, LEE S, et al. Learning to quantize deep networks by optimizing quantization intervals with task loss[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Washington D. C., USA: IEEE Press, 2019: 4350-4359.
[169] ZHOU S, LI L, ZHANG X, et al. LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection[C]//The Twelfth International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-15.
[170] XU J, FAN J, NAN B, et al. Aslog: An area-efficient cnn accelerator for per-channel logarithmic post-training quantization[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2023, 70(12): 5380-5393.

Please choose a citation manager

Content to export