| 1 |
|
| 2 |
GALLIFANT J, FISKE A, STREKALOVA L, et al. Peer review of GPT-4 technical report and systems card. PLoS Digital Health, 2024, 3(1): e0000417.
doi: 10.1371/journal.pdig.0000417
|
| 3 |
|
| 4 |
杨春, 张睿尧, 黄泷, 等. 深度神经网络模型量化方法综述. 工程科学学报, 2023, 45(10): 1613- 1629.
|
|
YANG C, ZHANG R Y, HUANG L, et al. A survey of quantization methods for deep neural networks. Chinese Journal of Engineering, 2023, 45(10): 1613- 1629.
|
| 5 |
CHEN M Z, SHAO W Q, XU P, et al. EfficientQAT: efficient quantization-aware training for large language models[EB/OL]. [2025-06-07]. https://arxiv.org/abs/2407.11062.
|
| 6 |
HASAN J. Optimizing large language models through quantization: a comparative analysis of PTQ and QAT techniques[EB/OL]. [2025-06-07]. https://arxiv.org/abs/2411.06084.
|
| 7 |
AMINE KERKOURI M, TLIBA M, CHETOUANI A, et al. Quantization effects on neural networks perception: how would quantization change the perceptual field of vision models?[C]//Proceedings of the 13th International Conference on Image Processing Theory, Tools and Applications (IPTA). Washington D.C., USA: IEEE Press, 2024: 1-6.
|
| 8 |
JACOB B, KLIGYS S, CHEN B, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2018: 2704-2713.
|
| 9 |
COURBARIAUX M, HUBARA I, SOUDRY D, et al. Binarized neural networks: training deep neural networks with weights and activations constrained to +1 or -1[EB/OL]. [2025-09-01]. https://arxiv.org/abs/1602.02830.
|
| 10 |
咸聪慧, 王天一, 李超, 等. 基于量化的深度神经网络优化研究综述. 山东师范大学学报(自然科学版), 2024, 39(1): 21- 32.
|
|
XIAN C H, WANG T Y, LI C, et al. Review of quantization-based deep neural network optimization research. Journal of Shandong Normal University (Natural Sciences Edition), 2024, 39(1): 21- 32.
|
| 11 |
|
| 12 |
|
| 13 |
ZHU X Y, LI J, LIU Y, et al. A survey on model compression for large language models. Transactions of the Association for Computational Linguistics, 2024, 12, 1556- 1577.
doi: 10.1162/tacl_a_00704
|
| 14 |
QIN H T, GONG R H, LIU X L, et al. Binary neural networks: a survey. Pattern Recognition, 2020, 105, 107281.
doi: 10.1016/j.patcog.2020.107281
|
| 15 |
ZHANG Z, GAO Y C, FAN J C, et al. SelectQ: calibration data selection for post-training quantization. Machine Intelligence Research, 2025, 22(3): 499- 510.
doi: 10.1007/s11633-024-1518-0
|
| 16 |
HUBARA I, NAHSHAN Y, HANANI Y, et al. Improving post training neural quantization: layer-wise calibration and integer programming[EB/OL]. [2025-06-07]. https://arxiv.org/abs/2006.10518.
|
| 17 |
GONG R H, LIU X L, JIANG S H, et al. Differentiable soft quantization: bridging full-precision and low-bit neural networks[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D. C., USA: IEEE Press, 2019: 4852-4861.
|
| 18 |
LI Y H, GONG R H, TAN X, et al. BRECQ: pushing the limit of post-training quantization by block reconstruction[C]//Proceedings of the International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2021: 1-16.
|
| 19 |
CHICCO D, WARRENS M J, JURMAN G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science, 2021, 7, e623.
doi: 10.7717/peerj-cs.623
|
| 20 |
郭秋丹, 濮约刚, 张启军, 等. 基于舍入误差的神经网络量化方法. 计算机工程与设计, 2024, 45(8): 2534- 2539.
|
|
GUO Q D, PU Y G, ZHANG Q J, et al. Neural network quantization method based on round-error. Computer Engineering and Design, 2024, 45(8): 2534- 2539.
|
| 21 |
KYURKCHIEV N, MARKOV S. Sigmoid functions: some approximation and modelling aspects[M]. [S. l. ]: LAP Lambert Academic Publishing, 2015.
|
| 22 |
LIU L Y, JIANG H M, HE P C, et al. On the variance of the adaptive learning rate and beyond[C]//Proceedings of the 8th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2020: 1-14.
|
| 23 |
|
| 24 |
DING Y F, FENG W L, CHEN C Y, et al. Reg-PTQ: regression-specialized post-training quantization for fully quantized object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2024: 16174-16184.
|
| 25 |
WANG Z W, WU Z Y, LU J W, et al. BiDet: an efficient binarized object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2020: 2049-2058.
|
| 26 |
CHEN P, LIU J, ZHUANG B H, et al. AQD: towards accurate quantized object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2021: 104-113.
|
| 27 |
NIU L, LIU J W, YUAN Z H, et al. Improving posttraining quantization on object detection with task loss-guided LP metric[EB/OL]. [2025-06-07]. https://arxiv.org/abs/2304.09785.
|
| 28 |
SO J, LEE J, AHN D, et al. Temporal dynamic quantization for diffusion models. Advances in Neural Information Processing Systems, 2023, 36, 48686- 48698.
|
| 29 |
WANG C Y, WANG Z W, XU X W, et al. Towards accurate post-training quantization for diffusion models[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2024: 16026-16035.
|
| 30 |
LIU X C, YE M, ZHOU D Y, et al. Post-training quantization with multiple points: mixed precision without mixed precision[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2021: 8697-8705.
|
| 31 |
HE Y F, LIU J, WU W J, et al. EfficientDM: efficient quantization-aware fine-tuning of low-bit diffusion models[C]//Proceedings of the 12th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-20.
|
| 32 |
HUBARA I, NAHSHAN Y, HANANI Y, et al. Accurate post training quantization with small calibration sets[C]//Proceedings of the International Conference on Machine Learning. New York, USA: ACM Press, 2021: 4466-4475.
|
| 33 |
WEI X Y, ZHANG Y C, ZHANG X G, et al. Outlier suppression: pushing the limit of low-bit Transformer language models. Advances in Neural Information Processing Systems, 2022, 35, 17402- 17414.
|
| 34 |
WEI X Y, ZHANG Y C, LI Y H, et al. Outlier suppression+: accurate quantization of large language models by equivalent and optimal shifting and scaling[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudburg, USA: ACL Press, 2023: 1648-1665.
|
| 35 |
KIM N J, LEE J, KIM H. HyQ: hardware-friendly post-training quantization for CNN-Transformer hybrid networks[C]//Proceedings of the 33rd International Joint Conference on Artificial Intelligence. Jeju Island, Republic of Korea: [s.n.], 2024: 4291-4299.
|
| 36 |
SHANG Y Z, LIU G W, KOMPELLA R R, et al. Enhancing post-training quantization calibration through contrastive learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2024: 15921-15930.
|
| 37 |
LI X Y, LIU Y J, LIAN L, et al. Q-Diffusion: quantizing diffusion models[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D. C., USA: IEEE Press, 2023: 17489-17499.
|
| 38 |
SHANG Y Z, YUAN Z H, XIE B, et al. Post-training quantization on diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2023: 1972-1981.
|
| 39 |
TANG S A, WANG X, CHEN H, et al. Post-training quantization with progressive calibration and activation relaxing for text-to-image diffusion models[C]//Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2024: 404-420.
|
| 40 |
|
| 41 |
JIANG Y F, SUN N, XIE X S, et al. ADFQ-ViT: activation-distribution-friendly post-training quantization for vision Transformers. Neural Networks, 2025, 186, 107289.
doi: 10.1016/j.neunet.2025.107289
|
| 42 |
LI Z K, XIAO J R, YANG L W, et al. RepQ-ViT: scale reparameterization for post-training quantization of vision Transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D. C., USA: IEEE Press, 2023: 17181-17190.
|
| 43 |
OH S, SIM H, KIM J, et al. Non-uniform step size quantization for accurate post-training quantization//Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022, 658- 673.
|
| 44 |
MOON J, KIM D, CHEON J, et al. Instance-aware group quantization for vision Transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2024: 16132-16141.
|
| 45 |
RANJAN N, SAVAKIS A. LRP-QViT: mixed-precision vision Transformer quantization via layer-wise relevance propagation[EB/OL]. [2025-06-07]. https://arxiv.org/abs/2401.11243.
|
| 46 |
LEE J, KWON Y, PARK S, et al. Q-HyViT: post-training quantization of hybrid vision Transformers with bridge block reconstruction for IoT systems. IEEE Internet of Things Journal, 2024, 11(22): 36384- 36396.
doi: 10.1109/JIOT.2024.3403844
|
| 47 |
YANG D W, HE N, HU X, et al. Post-training quantization for re-parameterization via coarse & fine weight splitting. Journal of Systems Architecture, 2024, 147, 103065.
doi: 10.1016/j.sysarc.2024.103065
|
| 48 |
RYU H, LIM S, SHIM H. Memory-efficient fine-tuning for quantized diffusion model[C]//Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2024: 356-372.
|
| 49 |
WANG H X, SHANG Y Z, YUAN Z H, et al. QuEST: low-bit diffusion model quantization via efficient selective finetuning[EB/OL]. [2025-06-07]. https://arxiv.org/abs/2402.03666.
|
| 50 |
LIU X W, LI Z K, XIAO J R, et al. Enhanced distribution alignment for post-training quantization of diffusion models[EB/OL]. [2025-06-07]. https://arxiv.org/abs/2401.04585.
|
| 51 |
YAO H Y, LI P, CAO J, et al. RAPQ: rescuing accuracy for power-of-two low-bit post-training quantization[C]//Proceedings of the 31st International Joint Conference on Artificial Intelligence. Vienna, Austria: International Joint Conferences on Artificial Intelligence Organization, 2022: 1573-1579.
|
| 52 |
SHOMRON G, GABBAY F, KURZUM S, et al. Post-training sparsity-aware quantization. Advances in Neural Information Processing Systems, 2021, 34, 17737- 17748.
|
| 53 |
WANG C B, ZHENG D D, LIU Y L, et al. Leveraging inter-layer dependency for post-training quantization. Advances in Neural Information Processing Systems, 2022, 35, 6666- 6679.
|
| 54 |
JEON Y, LEE C, CHO E, et al. Mr. BiQ: post-training non-uniform quantization based on minimizing the reconstruction error[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2022: 12319-12328.
|
| 55 |
BAI S P, CHEN J, SHEN X T, et al. Unified data-free compression: pruning and quantization without fine-tuning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D. C., USA: IEEE Press, 2023: 5853-5862.
|
| 56 |
|
| 57 |
YIN J J, DONG J H, WANG Y H, et al. ModuLoRA: finetuning 2-bit LLMs on consumer GPUs by integrating with modular quantizers[EB/OL]. [2025-06-07]. https://arxiv.org/abs/2309.16119.
|
| 58 |
XU K, LI Z C, WANG S S, et al. PTMQ: post-training multi-bit quantization of neural networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2024: 16193-16201.
|
| 59 |
DETTMERS T, LEWIS M, SHLEIFER S, et al. 8-bit optimizers via block-wise quantization[C]//Proceedings of the International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2022: 1-19.
|
| 60 |
FRANTAR E, ALISTARH D. Optimal brain compression: a framework for accurate post-training quantization and pruning. Advances in Neural Information Processing Systems, 2022, 35, 4475- 4488.
|
| 61 |
NAGEL M, AMJAD R A, VAN BAALEN M, et al. Up or down? Adaptive rounding for post-training quantization[C]//Proceedings of the International Conference on Machine Learning. New York, USA: ACM Press, 2020: 7197-7206.
|
| 62 |
WEI X Y, GONG R H, LI Y H, et al. QDROP: randomly dropping quantization for extremely low-bit post-training quantization[C]// Proceedings of the International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2022: 1-19.
|
| 63 |
田程, 李正杰, 陈功富, 等. 深度神经网络低比特量化方法综述. 现代信息科技, 2025, 9(10): 23-33, 38.
|
|
TIAN C, LI Z J, CHEN G F, et al. Review on low-bit quantization methods for deep neural networks. Modern Information Technology, 2025, 9(10): 23-33, 38.
|
| 64 |
HE Y F, LIU L P, LIU J, et al. PTQD: accurate post-training quantization for diffusion models[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2023: 13237-13249.
|
| 65 |
BHALGAT Y, LEE J, NAGEL M, et al. LSQ: improving low-bit quantization through learnable offsets and better initialization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Washington D. C., USA: IEEE Press, 2020: 2978-2985.
|
| 66 |
LEE J H, KIM J, KWON S J, et al. FlexRound: learnable rounding based on element-wise division for post-training quantization[C]//Proceedings of the International Conference on Machine Learning. New York, USA: ACM Press, 2023: 18913-18939.
|
| 67 |
KIM H B, LEE J H, YOO S, et al. MetaMix: meta-state precision searcher for mixed-precision activation quantization[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2024: 13132-13141.
|
| 68 |
ZHOU S F, LI L, ZHANG X Y, et al. LiDAR-PTQ: post-training quantization for point cloud 3D object detection[C]//Proceedings of the 12th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-15.
|
| 69 |
LIN J, TANG J M, TANG H T, et al. AWQ: activation-aware weight quantization for on-device LLM compression and acceleration. Proceedings of Machine Learning and Systems, 2024, 6, 87- 100.
|
| 70 |
PAN J Y, WANG C C, ZHENG K F, et al. SmoothQuant+: accurate and efficient 4-bit post-training weight quantization for LLM[EB/OL]. [2025-06-08]. https://arxiv.org/abs/2312.03788.
|
| 71 |
|
| 72 |
MA Y X, LI H X, ZHENG X W, et al. Solving oscillation problem in post-training quantization through a theoretical perspective[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2023: 7950-7959.
|
| 73 |
LIN Y, ZHANG T Y, SUN P Q, et al. FQ-ViT: post-training quantization for fully quantized vision Transformer[C]//Proceedings of the 31st International Joint Conference on Artificial Intelligence. Vienna, Austria: International Joint Conferences on Artificial Intelligence Organization, 2022: 1173-1179.
|
| 74 |
LV C T, CHEN H, GUO J Y, et al. PTQ4SAM: post-training quantization for segment anything[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2024: 15941-15951.
|
| 75 |
LIU Y J, YANG H R, DONG Z, et al. NoisyQuant: noisy bias-enhanced post-training activation quantization for vision Transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2023: 20321-20330.
|
| 76 |
CHEE J, CAI Y H, KULESHOV V, et al. QuIP: 2-bit quantization of large language models with guarantees. Advances in Neural Information Processing Systems, 2023, 36, 4396- 4429.
|
| 77 |
ADEPU H, ZENG Z P, ZHANG L, et al. FrameQuant: flexible low-bit quantization for Transformers[C]//Proceedings of the 41st International Conference on Machine Learning. New York, USA: ACM Press, 2024: 203-227.
|
| 78 |
YUAN Z H, SHANG Y Z, DONG Z. PB-LLM: partially binarized large language models[C]//Proceedings of the 12th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-14.
|
| 79 |
LIN C, PENG B, LI Z Y, et al. Bit-Shrinking: limiting instantaneous sharpness for improving post-training quantization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2023: 16196-16205.
|
| 80 |
WANG M Z, SUN H X, SHI J, et al. Q-YOLO: efficient inference for real-time object detection[C]//Proceedings of the Asian Conference on Pattern Recognition. Berlin, Germany: Springer, 2023: 307-321.
|
| 81 |
PHAM C, HOANG A D, NGUYEN C C, et al. MetaAug: meta-data augmentation for post-training quantization[C]//Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2024: 236-252.
|
| 82 |
ZHANG X G, QIN H T, DING Y F, et al. Diversifying sample generation for accurate data-free quantization[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2021: 15658-15667.
|
| 83 |
GUO C, QIU Y X, LENG J W, et al. SQuant: on-the-fly data-free quantization via diagonal hessian approximation[C]//Proceedings of the International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2022: 1-18.
|
| 84 |
YAO Z, WU X, LI C, et al. ZeroQuant-V2: exploring post-training quantization in LLMs from comprehensive study to low rank compensation[EB/OL]. [2025-10-30]. https://arxiv.org/abs/2303.08302.
|
| 85 |
DETTMERS T, SVIRSCHEVSKI R A, EGIAZARIAN V, et al. SPQR: a sparse-quantized representation for near-lossless LLM weight compression[C]//Proceedings of the 12th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-29.
|
| 86 |
CAI Y H, YAO Z W, DONG Z, et al. ZeroQ: a novel zero shot quantization framework[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2020: 13169-13178.
|
| 87 |
FAN C X, WANG Z Q, GUO D, et al. Data-free quantization via pseudo-label filtering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2024: 5589-5598.
|
| 88 |
JEON Y, LEE C, KIM H Y. GENIE: show me the data for quantization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2023: 12064-12073.
|
| 89 |
MASOURIS A, SHARMA M, BOGUSZEWSKI A, et al. Post-training model quantization using GANs for synthetic data generation[EB/OL]. [2025-06-08]. https://arxiv.org/abs/2305.06052.
|
| 90 |
ANDREEV P, FRITZLER A. Quantization of generative adversarial networks for efficient inference: a methodological study[C]//Proceedings of the 26th International Conference on Pattern Recognition (ICPR). Washington D. C., USA: IEEE Press, 2022: 2179-2185.
|
| 91 |
LI Z K, MA L P, CHEN M J, et al. Patch similarity aware data-free quantization for vision Transformers[C]//Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 154-170.
|
| 92 |
RAMACHANDRAN A, KUNDU S, KRISHNA T. CLAMP-ViT: contrastive data-free learning for adaptive post-training quantization of ViTs[C]//Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2024: 307-325.
|
| 93 |
|
| 94 |
CAO J L, CHOLAKKAL H, ANWER R M, et al. D2Det: towards high quality object detection and instance segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2020: 11482-11491.
|
| 95 |
杨雨迪, 葛海波, 辛世澳, 等. 融合超分辨率和特征增强的轻量化遥感图像小目标检测. 计算机工程, 2024, 50(11): 284- 296.
|
|
YANG Y D, GE H B, XIN S A, et al. Lightweight small-object detection for remote sensing images integrating super-resolution and feature enhancement. Computer Engineering, 2024, 50(11): 284- 296.
|
| 96 |
张玉博, 杨帆, 郭亚, 等. 基于视觉大模型的垃圾分类轻量化算法研究. 计算机工程, 2025, 51(7): 140- 151.
|
|
ZHANG Y B, YANG F, GUO Y, et al. Research on lightweight algorithm for garbage classification based on visual large model. Computer Engineering, 2025, 51(7): 140- 151.
|
| 97 |
MAKHOV D, OSTAPETS R, ZHELAVSKAYA I, et al. Towards robust full low-bit quantization of super resolution networks C]//Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2024, 182- 198.
|
| 98 |
TU Z J, HU J, CHEN H T, et al. Toward accurate post-training quantization for image super resolution[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2023: 5856-5865.
|
| 99 |
TANG C, MENG Y, JIANG J C, et al. Retraining-free model quantization via one-shot weight-coupling learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2024: 15855-15865.
|
| 100 |
NAGEL M, BAALEN M V, BLANKEVOORT T, et al. Data-free quantization through weight equalization and bias correction[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D. C., USA: IEEE Press, 2019: 1325-1334.
|
| 101 |
DUNG H A, PHAM C, LE T, et al. Sharpness-aware data generation for zero-shot quantization[C]//Proceedings of the International Conference on Machine Learning. New York, USA: ACM Press, 2024: 12034-12045.
|
| 102 |
LIU J, GONG R H, WEI X Y, et al. QLLM: accurate and efficient low-bitwidth quantization for large language models[C]// Proceedings of the 12th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-23.
|
| 103 |
XIAO G X, LIN J, SEZNEC M, et al. SmoothQuant: accurate and efficient post-training quantization for large language models[C]//Proceedings of the International Conference on Machine Learning. New York, USA: ACM Press, 2023: 38087-38099.
|
| 104 |
|
| 105 |
LEE C H, JIN J, KIM T, et al. OWQ: outlier-aware weight quantization for efficient fine-tuning and inference of large language models[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2024: 13355-13364.
|
| 106 |
KIM S, HOOPER C R C, GHOLAMI A, et al. SqueezeLLM: dense-and-sparse quantization[C]//Proceedings of the International Conference on Machine Learning. New York, USA: ACM Press, 2024: 23901-23923.
|
| 107 |
KIM Y J, HENRY R, FAHIM R, et al. FineQuant: unlocking efficiency with fine-grained weight-only quantization for LLMs[EB/OL]. [2025-06-08]. https://arxiv.org/abs/2308.09723.
|
| 108 |
YAO Z W, WU X X, LI C, et al. Exploring post-training quantization in LLMs from comprehensive study to low rank compensation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2024: 19377-19385.
|
| 109 |
DING X, LIU X Y, TU Z J, et al. CBQ: cross-block quantization for large language models[C]//Proceedings of the International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2025: 1-20.
|
| 110 |
YAO Z W, AMINABADI R Y, ZHANG M J, et al. ZeroQuant: efficient and affordable post-training quantization for large-scale Transformers. Advances in Neural Information Processing Systems, 2022, 35, 27168- 27183.
|
| 111 |
FRANTAR E, ASHKBOOS S, HOEFLER T, et al. GPTQ: accurate post-training quantization for generative pre-trained Transformers[C]//Proceedings of the 11th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2023: 1-16.
|
| 112 |
ZAFRIR O, BOUDOUKH G, IZSAK P, et al. Q8BERT: quantized 8 bit BERT[C]// Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS). Washington D.C., USA: IEEE Press, 2019: 36-39.
|
| 113 |
SHEN S, DONG Z, YE J Y, et al. Q-BERT: Hessian based ultra low precision quantization of BERT[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2020: 8815-8821.
|
| 114 |
LIU Z, ZHAO C, FEDOROV I, et al. SpinQuant: LLM quantization with learned rotations[C]//Proceedings of the 13th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-24.
|
| 115 |
LIN Y, TANG H, YANG S, et al. QServe: W4A8KV4 quantization and system co-design for efficient LLM serving[C]//Proceedings of the 8th Conference on Machine Learning and Systems. Washington D. C., USA: IEEE Press, 2024: 1-28.
|
| 116 |
LIU Z H, WANG Y H, HAN K, et al. Post-training quantization for vision Transformer. Advances in Neural Information Processing Systems, 2021, 34, 28092- 28103.
|
| 117 |
WU Z, CHEN J X, ZHONG H W, et al. AdaLog: post-training quantization for vision Transformers with adaptive logarithm quantizer[C]//Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2024: 411-427.
|
| 118 |
YUAN Z H, XUE C H, CHEN Y Q, et al. PTQ4ViT: post-training quantization for vision Transformers with twin uniform quantization[C]//Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 191-207.
|
| 119 |
LIU X Y, DING X, YU L, et al. PQ-SAM: post-training quantization for segment anything model[C]//Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2024: 420-437.
|
| 120 |
ZHONG Y S, HU J W, HUANG Y, et al. ERQ: error reduction for post-training quantization of vision Transformers[C]//Proceedings of the International Conference on Machine Learning. New York, USA: ACM Press, 2024: 61664-61680.
|
| 121 |
RANJAN N, SAVAKIS A. Mix-QViT: mixed-precision vision Transformer quantization driven by layer importance and quantization sensitivity[EB/OL]. [2025-06-08]. https://arxiv.org/abs/2501.06357.
|
| 122 |
YAO Y Z, TIAN F, CHEN J, et al. Timestep-aware correction for quantized diffusion models[C]//Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2024: 215-232.
|
| 123 |
|
| 124 |
ZHAO T C, NING X F, FANG T C, et al. MixDQ: memory-efficient few-step text-to-image diffusion models with metric-decoupled mixed precision quantization[C]//Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2024: 285-302.
|
| 125 |
PARK G, KIM M, LEE S, et al. LUT-GEMM: quantized matrix multiplication based on LUTs for efficient inference in large-scale generative language models[C]//Proceedings of the 20th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-18.
|
| 126 |
DETTMERS T, LEWIS M, BELKADA Y, et al. LLM.int8(): 8-bit matrix multiplication for Transformers at scale. Advances in Neural Information Processing Systems, 2022, 35, 30318- 30332.
|
| 127 |
HOOPER C, KIM S, MOHAMMADZADEH H, et al. KVQuant: towards 10 million context length LLM inference with KV cache quantization. Advances in Neural Information Processing Systems, 2024, 37, 1270- 1303.
|
| 128 |
YUE Y X, YUAN Z H, DUANMU H, et al. WKVQuant: quantizing weight and key/value cache for large language models gains more[EB/OL]. [2025-06-08]. https://arxiv.org/abs/2402.12065.
|
| 129 |
GUO C, TANG J M, HU W M, et al. OliVe: accelerating large language models via hardware-friendly outlier-victim pair quantization[C]//Proceedings of the 50th Annual International Symposium on Computer Architecture. New York, USA: ACM Press, 2023: 1-15.
|
| 130 |
GUO Y P, LANG Y L, REN Q Y. GPTQT: quantize large language models twice to push the efficiency[C]//Proceedings of the IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE International Conference on Robotics, Automation and Mechatronics (RAM). Washington D.C., USA: IEEE Press, 2024: 368-373.
|
| 131 |
BAI H L, HOU L, SHANG L F, et al. Towards efficient post-training quantization of pre-trained language models. Advances in Neural Information Processing Systems, 2022, 35, 1405- 1418.
|
| 132 |
MA Y X, LI H X, ZHENG X W, et al. Outlier-aware slicing for post-training quantization in vision Transformer[C]//Proceedings of the 41st International Conference on Machine Learning. New York, USA: ACM Press, 2024: 33811-33825.
|
| 133 |
LIU J W, NIU L, YUAN Z H, et al. PD-Quant: post-training quantization based on prediction difference metric[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 24427-24437.
|
| 134 |
SHAO W, CHEN M, ZHANG Z, et al. OmniQuant: omnidirectionally calibrated quantization for large language models[C]// Proceedings of the 12th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-25.
|
| 135 |
|
| 136 |
|
| 137 |
CHENG W H, ZHANG W W, SHEN H H, et al. Optimize weight rounding via signed gradient descent for the quantization of LLMs[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudburg, USA: ACL Press, 2024: 11332- 11350.
|
| 138 |
|
| 139 |
SHEN Y L, SONG K T, TAN X, et al. HuggingGPT: solving AI tasks with ChatGPT and its friends in Hugging Face. Advances in Neural Information Processing Systems, 2023, 36, 38154- 38180.
|
| 140 |
TARAGHI M, DORCELUS G, FOUNDJEM A, et al. Deep learning model reuse in the HuggingFace community: challenges, benefit and trends[C]//Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). Washington D. C., USA: IEEE Press, 2024: 512-523.
|
| 141 |
|
| 142 |
|
| 143 |
|
| 144 |
|
| 145 |
|
| 146 |
|
| 147 |
TONG Z H, DU N, SONG X B, et al. Study on MindSpore deep learning framework[C]//Proceedings of the 17th International Conference on Computational Intelligence and Security (CIS). Washington D. C., USA: IEEE Press, 2021: 183-186.
|
| 148 |
|
| 149 |
LORO F, PAU D, TOMASELLI V. A QKeras neural network zoo for deeply quantized imaging[C]// Proceedings of the IEEE 6th International Forum on Research and Technology for Society and Industry (RTSI). Washington D. C., USA: IEEE Press, 2021: 165-170.
|
| 150 |
|
| 151 |
PANG B, NIJKAMP E, WU Y N. Deep learning with TensorFlow: a review. Journal of Educational and Behavioral Statistics, 2020, 45(2): 227- 248.
doi: 10.3102/1076998619872761
|
| 152 |
|
| 153 |
JIANG X T, WANG H, CHEN Y L, et al. MNN: a universal and efficient inference engine. Proceedings of Machine Learning and Systems, 2020, 2, 1- 13.
|
| 154 |
|
| 155 |
YU Y, YIN Q, ZHANG J, et al. ADMN: agent-driven modular network for dynamic parameter sharing in cooperative multi-agent reinforcement learning[C]// Proceedings of the 33rd International Joint Conference on Artificial Intelligence. Jeju Island, Republic of Korea: [s.n.], 2024: 302-310.
|
| 156 |
|
| 157 |
|
| 158 |
ZHOU Y X, GUO Z S, DONG Z, et al. TensorRT implementations of model quantization on edge SoC[C]//Proceedings of the IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip(MCSoC). Washington D. C., USA: IEEE Press, 2023: 486-493.
|
| 159 |
|
| 160 |
|
| 161 |
|
| 162 |
|
| 163 |
DEMIDOVSKIJ A, GORBACHEV Y, FEDOROV M, et al. OpenVINO deep learning workbench: comprehensive analysis and tuning of neural networks inference[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Washington D. C., USA: IEEE Press, 2019: 783-787.
|
| 164 |
LIU Z, OGUZ B, ZHAO C, et al. LLM-QAT: data-free quantization aware training for large language models[C]//Proceedings of Findings of the Association for Computational Linguistics ACL 2024. Stroudburg, USA: ACL Press, 2024: 467-484.
|
| 165 |
QU X Y, APONTE D, BANBURY C, et al. Automatic joint structured pruning and quantization for efficient neural network training and compression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2025: 15234-15244.
|
| 166 |
GAO T X, GUO L, ZHAO S W, et al. QuantNAS: quantization-aware neural architecture search for efficient deployment on mobile device[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Washington D. C., USA: IEEE Press, 2024: 1704-1713.
|
| 167 |
XIA M, GAO T, ZENG Z, et al. Sheared LLaMA: accelerating language model pre-training via structured pruning[C]// Proceedings of the 12th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-25.
|
| 168 |
JUNG S, SON C, LEE S, et al. Learning to quantize deep networks by optimizing quantization intervals with task loss[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2019: 4345-4354.
|
| 169 |
ZHOU S, LI L, ZHANG X, et al. LiDAR-PTQ: post-training quantization for point cloud 3D object detection[C]//Proceedings of the 12th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-15.
|
| 170 |
XU J W, FAN J S, NAN B L, et al. ASLog: an area-efficient CNN accelerator for per-channel logarithmic post-training quantization. IEEE Transactions on Circuits and Systems I: Regular Papers, 2023, 70(12): 5380- 5393.
doi: 10.1109/TCSI.2023.3315299
|