作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (1): 190-197. doi: 10.19678/j.issn.1000-3428.0068782

• 图形图像处理 • 上一篇    下一篇

学习驱动的图像压缩算法研究

杨红菊1,2,*(), 吉昌1   

  1. 1. 山西大学计算机与信息技术学院, 山西 太原 030006
    2. 山西大学计算智能与中文信息处理教育部重点实验室, 山西 太原 030006
  • 收稿日期:2023-11-07 出版日期:2025-01-15 发布日期:2024-04-26
  • 通讯作者: 杨红菊
  • 基金资助:
    国家自然科学基金(62376154); 山西省自然科学基金(202303021211024); 研究生教育创新计划精品教学案例项目(2023AL04)

Research on Learning-Driven Image Compression Algorithm

YANG Hongju1,2,*(), JI Chang1   

  1. 1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, Shanxi, China
    2. Key Laboratory of Computational Intelligence and Chinese Information Processing, Ministry of Education, Shanxi University, Taiyuan 030006, Shanxi, China
  • Received:2023-11-07 Online:2025-01-15 Published:2024-04-26
  • Contact: YANG Hongju

摘要:

目前, 基于卷积神经网络(CNN)深度学习的图像压缩已经取得了优异的成果, 但是CNN的感受野通常受限, 不能学习到图像非局部区域内像素之间的上下文关系, 缺少了长距离建模和感知能力, 容易造成结果失真、伪影和压缩率较高等问题。针对以上问题, 提出2种解决办法: 设计一种由CNN、多尺度注意力(MSA)机制和残差单元构成的对称编解码器架构, 该架构在对图片进行通道变换和空间变换的同时引入多尺度注意力机制, 能够对特征进行重新校准, 减少潜在表示的冗余像素; 设计一种基于U型框架的超先验网络, 可以在不同层级上获取多尺度的上下文信息, 在帮助提取高级语义特征的同时, 保留详细的低级特征信息, 能够更好地进行边界细化和细节恢复。在Kodak、Tecnick和CLIC这3种数据集上将所提方法与其他先进的图像压缩方法进行对比实验, 结果表明, 在相同比特率下, 该方法相较对比方法在峰值信噪比(PSNR)指标上分别提高了约0.3 dB、0.6 dB、0.5 dB。所提方法在保证压缩率的同时, 能够有效提高对非重复纹理特征和图像细节特征的重建效果。

关键词: 深度学习, 图像压缩, 多尺度注意力机制, 超先验网络, Transformer

Abstract:

At present, image compression based on Convolutional Neural Network (CNN) deep learning has achieved excellent results. However, owing to its limited sensing field, it cannot learn the contextual relationships between pixels in nonlocal regions of the image, lacking the capability for long-distance modeling and sensing. This limitation results in distortion, artifacts, and issues with higher compression rates. This study proposes two solutions to address these issues. First, we design a symmetric codec architecture comprising a CNN, Multi-Scale Attention (MSA) mechanism, and residual units. The MSA, combined with channel and spatial transformations, recalibrates features and reduces redundant pixels in potential representations. Second, a super-priori network based on a U-shaped framework is designed to obtain multi-scale contextual information at different levels. This approach helps extract high-level semantic features while retaining detailed low-level information, enabling better boundary refinement and detailed recovery. Comparison experiments with other advanced image compression methods on three datasets—Kodak, Tecnick, and CLIC—demonstrated that the proposed model improves the Peak Signal-to-Noise Ratio (PSNR) by approximately 0.3 dB, 0.6 dB, and 0.5 dB, respectively. At the same bit rate, the model proposed in this study effectively enhances the reconstruction of non-repetitive texture and image detail features while maintaining the compression rate.

Key words: deep learning, image compression, Multi-Scale Attention(MSA) mechanism, super-prior networks, Transformer