学习驱动的图像压缩算法研究

doi:10.19678/j.issn.1000-3428.0068782

摘要/Abstract

摘要：

目前, 基于卷积神经网络(CNN)深度学习的图像压缩已经取得了优异的成果, 但是CNN的感受野通常受限, 不能学习到图像非局部区域内像素之间的上下文关系, 缺少了长距离建模和感知能力, 容易造成结果失真、伪影和压缩率较高等问题。针对以上问题, 提出2种解决办法: 设计一种由CNN、多尺度注意力(MSA)机制和残差单元构成的对称编解码器架构, 该架构在对图片进行通道变换和空间变换的同时引入多尺度注意力机制, 能够对特征进行重新校准, 减少潜在表示的冗余像素; 设计一种基于U型框架的超先验网络, 可以在不同层级上获取多尺度的上下文信息, 在帮助提取高级语义特征的同时, 保留详细的低级特征信息, 能够更好地进行边界细化和细节恢复。在Kodak、Tecnick和CLIC这3种数据集上将所提方法与其他先进的图像压缩方法进行对比实验, 结果表明, 在相同比特率下, 该方法相较对比方法在峰值信噪比(PSNR)指标上分别提高了约0.3 dB、0.6 dB、0.5 dB。所提方法在保证压缩率的同时, 能够有效提高对非重复纹理特征和图像细节特征的重建效果。

关键词: 深度学习, 图像压缩, 多尺度注意力机制, 超先验网络, Transformer

Abstract:

At present, image compression based on Convolutional Neural Network (CNN) deep learning has achieved excellent results. However, owing to its limited sensing field, it cannot learn the contextual relationships between pixels in nonlocal regions of the image, lacking the capability for long-distance modeling and sensing. This limitation results in distortion, artifacts, and issues with higher compression rates. This study proposes two solutions to address these issues. First, we design a symmetric codec architecture comprising a CNN, Multi-Scale Attention (MSA) mechanism, and residual units. The MSA, combined with channel and spatial transformations, recalibrates features and reduces redundant pixels in potential representations. Second, a super-priori network based on a U-shaped framework is designed to obtain multi-scale contextual information at different levels. This approach helps extract high-level semantic features while retaining detailed low-level information, enabling better boundary refinement and detailed recovery. Comparison experiments with other advanced image compression methods on three datasets—Kodak, Tecnick, and CLIC—demonstrated that the proposed model improves the Peak Signal-to-Noise Ratio (PSNR) by approximately 0.3 dB, 0.6 dB, and 0.5 dB, respectively. At the same bit rate, the model proposed in this study effectively enhances the reconstruction of non-repetitive texture and image detail features while maintaining the compression rate.

Key words: deep learning, image compression, Multi-Scale Attention(MSA) mechanism, super-prior networks, Transformer

杨红菊, 吉昌. 学习驱动的图像压缩算法研究[J]. 计算机工程, 2025, 51(1): 190-197.

YANG Hongju, JI Chang. Research on Learning-Driven Image Compression Algorithm[J]. Computer Engineering, 2025, 51(1): 190-197.

https://www.ecice06.com/CN/Y2025/V51/I1/190

图/表 10

图1 本文模型框架

Fig.1 The framework of the model in this paper

图2 多尺度注意力模块结构

Fig.2 Structure of multi-scale attention module

图3 基于U型框架的超先验网络架构

Fig.3 Hyperpriori network architecture based on U-shaped framework

图4 不同模型在Kodak数据集上的RD性能评估

Fig.4 RD performance evaluation of different models on the Kodak dataset

图5 CLIC专业验证数据集上的性能评估

Fig.5 Performance evaluation on CLIC professional validation dataset

图6 Tecnick数据集上的性能评估

Fig.6 Performance evaluation on Tecnick dataset

图7 不同模块的消融实验结果

Fig.7 Results of ablation experiments of different modules

图8 有效感受野可视化

Fig.8 Visualization of effective receptive field

图9 不同超先验网络特征图的可视化结果

Fig.9 Visualization results of different hyperpriori network feature maps

图10 不同方法下的可视化比较

Fig.10 Visualization comparison under different methods

参考文献 34

1	WALLACE G K . The JPEG still picturecompression standard. Communications of the ACM, 1991, 34 (4): 30- 44. doi: 10.1145/103085.103089
2	TAUBMAN D S , MARCELLIN M W , RABBANI M . JPEG2000: imagecompression fundamentals, standards and practice. Journal of Electronic Imaging, 2002, 11 (2): 286- 287. doi: 10.1117/1.1469618
3	Fabrice Bellard. BPG image format[EB/OL]. [2023-10-05]. https://bellard.org/bpg/.
4	叶宗苗. 基于深度学习的端到端智能图像压缩研究[D]. 杭州: 杭州电子科技大学, 2022.
	YE Z M. Research on end-to-end intelligent imagecompression based on deep learning[D]. Hangzhou: Hangzhou Dianzi University, 2022. (in Chinese)
5	LIN F Z, SUN H M, LIU J M, et al. Multistage spatial context models for learned imagecompression[C]//Proceedings of 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D.C., USA: IEEE Press, 2023: 1-5.
6	WANG D Z, YANG W H, HU Y Y, et al. Neural data-dependent transform for learned imagecompression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2022: 17379-17388.
7	FU H S, LIANG F, LIN J P, et al. Learned imagecompression with Gaussian-Laplacian-logistic mixture model and concatenated residual modules[EB/OL]. [2023-10-05]. http://arxiv.org/abs/2107.06463v3.
8	CHENG Z X, SUN H M, TAKEUCHI M, et al. Learned imagecompression with discretized Gaussian mixture likelihoods and attention modules[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2020: 7939-7948.
9	BALLÉ J, LAPARRA V, SIMONCELLI E P. End-to-end optimized imagecompression[EB/OL]. [2023-10-05]. http://arxiv.org/abs/1611.01704v3.
10	BALLÉ J, LAPARRA V, SIMONCELLI E P. Density modeling of images using a generalized normalization transformation[EB/OL]. [2023-10-05]. http://arxiv.org/abs/1511.06281v4.
11	MINNEN D, SINGH S. Channel-wise autoregressive entropy models for learned imagecompression[C]//Proceedings of the IEEE International Conference on Image Processing. Washington D.C., USA: IEEE Press, 2020: 3339-3343.
12	GUO Z Y , ZHANG Z Z , FENG R S , et al. Causal contextual prediction for learned imagecompression. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32 (4): 2329- 2341. doi: 10.1109/TCSVT.2021.3089491
13	HE D L, YANG Z M, PENG W K, et al. ELIC: efficient learned imagecompression with unevenly grouped space-channel contextual adaptive coding[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2022: 5718-5727.
14	MENTZER F, TODERICI G, TSCHANNEN M, et al. High-fidelity generative imagecompression[EB/OL]. [2023-10-05]. http://arxiv.org/abs/2006.09965v3.
15	XIE Y Q, CHENG K L, CHEN Q F. Enhanced invertible encoding for learned imagecompression[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York, USA: ACM Press, 2021: 162-170.
16	皇甫晓瑛, 钱惠敏, 黄敏. 结合注意力机制的深度神经网络综述. 计算机与现代化, 2023, (2): 40-49, 57.
	HUANGFU X Y , QIAN H M , HUANG M . A review of deep neural networkscombined with attention mechanism. Computer and Modernization, 2023, (2): 40-49, 57.
17	ZHAO L J , BAI H H , WANG A H , et al. Learning a virtual codec based on deep convolutional neural network tocompress image. Journal of Visual Communication and Image Representation, 2019, 63, 102589. doi: 10.1016/j.jvcir.2019.102589
18	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. [2023-10-05]. https://arxiv.org/abs/1706.03762.
19	CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[EB/OL]. [2023-10-05]. https://arxiv.org/abs/2005.12872.
20	ZOU R J, SONG C F, ZHANG Z X. The devil is in the details: window-based attention for imagecompression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2022: 17492-17501.
21	QIAN Y C, LIN M, SUN X Y, et al. Entroformer: a transformer-based entropy model for learned imagecompression[EB/OL]. [2023-10-05]. http://arxiv.org/abs/2202.05492v2.
22	KIM J H, HEO B, LEE J S. Joint global and local hierarchical priors for learned imagecompression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2022: 5992-6001.
23	HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[EB/OL]. [2023-10-05]. https://arxiv.org/abs/2006.11239.
24	HE D L, ZHENG Y Y, SUN B C, et al. Checkerboard context model for efficient learned imagecompression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2021: 14771-14780.
25	KINGMA D P, BA J, HAMMAD M M. Adam: a method for stochastic optimization[EB/OL]. [2023-10-05]. http://arxiv.org/abs/1412.6980v9.
26	KODAK E. Kodak lossless true color image suite (PhotoCD PCD0992) [EB/OL]. [2023-10-05]. http://r0k.us/graphics/kodak.
27	ASUNI N, GIACHETTI A. TESTIMAGES: a large-scale archive for testing visual devices and basic image processing algorithms[EB/OL]. [2023-10-05].https://www.researchgate.net/publication/268923817_TESTIMAGES_a_large-scale_archive_for_testing_visual_devices_and_basic_image_processing_algorithms.
28	TODERICI G, SHI W, TIMOFTE R, et al. Workshop and challenge on learned imagecompression [EB/OL]. [2023-10-05]. https://clic.compression.cc/.
29	BALLÉ J, MINNEN D, SINGH S, et al. Variational imagecompression with a scale hyperprior[EB/OL]. [2023-10-05]. http://arxiv.org/abs/1802.01436v2.
30	CHEN F D , XU Y M , WANG L . Two-stage octave residual network for end-to-end imagecompression. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36 (4): 3922- 3929. doi: 10.1609/aaai.v36i4.20308
31	MINNEN D, BALLÉ J, TODERICI G. Joint autoregressive and hierarchical priors for learned imagecompression[EB/OL]. [2023-10-05]. http://arxiv.org/abs/1809.02736v1.
32	TEAM J V E. VVC official test model VTM[EB/OL]. [2023-10-05]. https://scholar.google.com/scholar?hl=zh-CN&as_sdt=0%2C5&q=Joint+Video+Experts+Team.+Vvc+official+test+model+vtm&btnG=.
33	LUO W J, LI Y J, URTASUN R, et al. Understanding the effective receptive field in deep convolutional neural networks[EB/OL]. [2023-10-05]. http://arxiv.org/abs/1701.04128v2.
34	LIU J M, SUN H M, KATTO J. Learned imagecompression with mixed Transformer-CNN architectures[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2023: 14388-14397.

[1]	周宇, 谢威, 邝得互, 江健民. 基于三元自注意力的视频快照压缩成像重建[J]. 计算机工程, 2025, 51(1): 20-30.
[2]	胡升龙, 陈彬, 张开华, 宋慧慧. 场景结构知识增强的协同显著性目标检测[J]. 计算机工程, 2025, 51(1): 31-41.
[3]	喻勇涛, 孙奥, 李昂, 朱琳琳. 基于孪生网络的分类器输出重复性优化方法[J]. 计算机工程, 2025, 51(1): 118-127.
[4]	肖超恩, 李子凡, 张磊, 王建新, 钱思源. 基于Transformer模型与注意力机制的差分密码分析[J]. 计算机工程, 2025, 51(1): 156-163.
[5]	张会影, 圣文顺. 基于标记适应的人脸年龄识别优化算法[J]. 计算机工程, 2025, 51(1): 174-181.
[6]	王晓路, 汶建荣. 基于运动-时间感知的人体动作识别方法[J]. 计算机工程, 2025, 51(1): 216-224.
[7]	火久元, 苏泓瑞, 武泽宇, 王婷娟. 基于改进YOLOv8的道路交通小目标车辆检测算法[J]. 计算机工程, 2025, 51(1): 246-257.
[8]	王骞, 张俊华, 王泽彤, 李博. X2S-Net:基于双平面X线片的脊柱三维重建[J]. 计算机工程, 2025, 51(1): 277-286.
[9]	易鹏, 杨晔, 严仕嘉. 基于MPCNN模型的sEMG快速迁移学习的手势识别应用研究[J]. 计算机工程, 2025, 51(1): 304-311.
[10]	刘钟, 唐宏, 王宁喆, 朱传润. 融合RNN与稀疏自注意力的文本摘要方法[J]. 计算机工程, 2025, 51(1): 312-320.
[11]	刘兆伟, 方艳红, 郑明宇, 锁斌. 基于注意力机制与多任务的肺部疾病诊断方法[J]. 计算机工程, 2025, 51(1): 332-342.
[12]	魏嵬, 丁香香, 郭梦星, 杨钊, 刘辉. 文本相似度计算方法综述[J]. 计算机工程, 2024, 50(9): 18-32.
[13]	王志浩, 钱沄涛. 基于Swin Transformer的双流遥感图像时空融合超分辨率重建[J]. 计算机工程, 2024, 50(9): 33-45.
[14]	屈潇雅, 李兵, 温立强. 面向行政执法案件文本的事件抽取研究[J]. 计算机工程, 2024, 50(9): 63-71.
[15]	林畅, 郭伟, 任哲聪, 金海波. 基于Transformer的目标跟踪与分割统一算法[J]. 计算机工程, 2024, 50(9): 130-141.

选择文件类型/文献管理软件名称

选择包含的内容