作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (5): 62-72. doi: 10.19678/j.issn.1000-3428.0069206

• 人工智能与模式识别 • 上一篇    下一篇

面向深度学习编译器的多粒度量化框架支持与优化

魏铭康1, 李嘉楠1, 韩林1,2, 高伟2, 赵荣彩2, 王洪生2   

  1. 1. 郑州大学计算机与人工智能学院, 河南 郑州 450000;
    2. 国家超级计算郑州中心(郑州大学), 河南 郑州 450000
  • 收稿日期:2024-01-11 修回日期:2024-03-10 出版日期:2025-05-15 发布日期:2024-05-28
  • 通讯作者: 高伟,E-mail:yongwu22@126.com E-mail:yongwu22@126.com
  • 基金资助:
    河南省重大科技专项(221100210600)。

Support and Optimization of Multi-Granularity Quantization Framework for Deep Learning Compiler

WEI Mingkang1, LI Jianan1, HAN Lin1,2, GAO Wei2, ZHAO Rongcai2, WANG Hongsheng2   

  1. 1. School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, Henan, China;
    2. National Supercomputing Center in Zhengzhou (Zhengzhou University), Zhengzhou 450000, Henan, China
  • Received:2024-01-11 Revised:2024-03-10 Online:2025-05-15 Published:2024-05-28

摘要: 随着各大厂商对大模型应用部署需求的激增,深度学习编译器TVM(Tensor Virtual Machine)的单一量化方式精度下降,已无法满足部署需求。设计并构建一种可选粒度的模型量化框架,具体包括逐层与逐通道量化流程的支持,以及阈值搜索与自适应舍入优化算法的实现。首先,基于量化模块“relay.quantize”构建信息标注、阈值校准与量化图实现的框架流程,并添加粒度属性以显式识别量化方式。其次,针对预定义校准方法无法确定有效量化信息的问题,对量化中的阈值校准、权重舍入进行调优,提高量化后模型精度。实验采用ImageNet数据集对视觉网络进行测试,针对MobileNetV1新量化方案将8 bit量化后模型精度损失降低到2.3%,调优后该损失降低到0.7%,实验结果表明多粒度量化框架可有效降低量化误差。

关键词: 模型量化, 模型部署, 模型压缩, 推理加速, 深度学习编译器

Abstract: With the increasing demand for the deployment of large models by major manufacturers, the accuracy of the single quantization method of deep learning compiler Tensor Virtual Machine (TVM) has decreased, and this method is no longer sufficient to satisfy deployment requirements. Therefore, in this study, a flexible granularity model quantization framework is designed and constructed. This framework supports layer-wise and channel-wise quantization processes as well as the implementation of threshold search and adaptive rounding optimization algorithms. First, based on the quantization module ″relay.quantize″, a framework flow for information annotation, threshold calibration, and quantization graph realization is constructed, which includes granularity attributes to explicitly identify the quantization method. Second, fine-tuning is applied to the threshold calibration and weight rounding in quantization to address the issue of ineffective quantization information determination using predefined calibration methods, thereby improving the accuracy of the quantized model. Experiments are conducted using the ImageNet dataset to test visual networks. The results reveal that the new quantization scheme for MobileNetV1 reduces the loss of model accuracy to 2.3% after 8 bit quantization, and this loss is reduced to 0.7% after tuning. Hence, the multi-granularity quantization framework can effectively reduce the quantization error.

Key words: model quantization, model deployment, model compression, inference acceleration, deep learning compiler

中图分类号: