Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

A Survey of Post-Training Quantification Methods

  

  • Published:2025-11-25

后训练量化方法综述

Abstract: Post-Training Quantization (PTQ) is an efficient model compression method that converts the parameters of high-precision floating-point models into low-bit integer representations without the need for retraining, using only a small amount (or no) unlabeled calibration data. This method significantly reduces storage and computational overhead while maximizing the retention of the original model's inference accuracy, making it widely recognized and adopted in both academia and industry. This paper systematically summarizes the research progress of PTQ from four dimensions: quantization steps, method classification, tool ecosystem, and application advancements.First, a clear framework for the quantization process is constructed, covering steps such as dynamic range statistics, quantization parameter calculation, weight and activation quantization, error optimization, and model generation. Second, a complete classification system for quantization methods is proposed, which includes quantization granularity, bit width, calibration methods, and structure-guided quantization. Third, the tool ecosystem supporting the large-scale application of PTQ is analyzed, discussing its value in hardware adaptation and engineering deployment. Finally, the paper summarizes the integration and application progress of PTQ methods and highlights the challenges faced in practice, especially those related to cross-modal consistency, extremely low-bit semantic collapse, and hardware adaptation. These practical challenges not only reveal the limitations of current technologies but also provide important directions for future research. This review provides a reference framework for PTQ methods for both academia and industry, facilitating the widespread application of artificial intelligence in resource-constrained scenarios.

摘要: 后训练量化(Post-Training Quantization, PTQ)是一种高效的模型压缩方法,它无需重新训练模型,只需少量(或无需)无标签校准数据即可将高精度浮点模型的参数转换为低比特整数表示。该方法在显著降低存储与计算开销的同时,能够最大限度地保留原始模型的推理精度,因而受到学术界与工业界的广泛关注。本文从PTQ的量化步骤、方法分类、工具生态和应用进展四个维度,系统总结了PTQ的研究进展。首先,构建了清晰的量化流程框架,涵盖动态范围统计、量化参数计算、权重与激活量化、误差优化和模型生成等步骤;其次,提出了一个完整的量化方法分类体系,从量化粒度、位宽、校准方法到结构导向量化;再次,分析了支持PTQ规模化应用的工具生态,探讨了其在硬件适配和工程部署中的应用价值;最后,总结了PTQ方法的融合与应用进展,并指出PTQ方法在实践中面临的挑战,尤其是跨模态一致性、极低比特语义崩塌与硬件适配等难题。这些实践挑战的总结不仅揭示了当前技术的局限性,也为未来研究提供了重要方向。本综述为学术界与工业界提供了PTQ方法的参考框架,助力推动人工智能在资源受限场景中的广泛应用。