Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

Perception-Driven Image Complexity Prediction and Heatmap Generation

  

  • Published:2026-01-30

感知驱动的图像复杂度预测与热力图生成

Abstract: Image complexity prediction holds significant research importance in the fields of visual cognition and computer vision. Existing methods still face challenges in effectively simulating the human visual system, balancing model complexity and efficiency, and generating interpretable pixel-level heatmaps. To address these issues, an Interpretable Global-Local Complexity Fusion Network (IGLCFN) is proposed. IGLCFN primarily consists of three key modules: a complexity-aware encoding block, a cross-domain interaction module, and an interpretable image complexity heatmap generation module. The complexity-aware encoding block adopts a dual-branch structure, integrating the global semantic modeling capabilities of Vision Transformers with the local micro-modeling capabilities of Convolutional Neural Networks. This design simulates the multi-level perception process of the human visual system for image complexity. The cross-domain interaction module fully considers the characteristics of features extracted by different branches and is responsible for aligning Transformer sequence features with Convolutional Neural Network spatial features. Furthermore, the interpretable image complexity heatmap generation module generates pixel-level heatmaps that align with human visual system perception and are interpretable, by constructing a pseudo-label dataset based on the accumulation of local perceptual scores for supervised learning. Quantitative experimental results on the IC9600 dataset demonstrate that IGLCFN achieves state-of-the-art performance across all key metrics. Compared to various mainstream baselines, image quality, and image complexity prediction models, IGLCFN achieves the highest prediction performance while maintaining low computational resource consumption. Additionally, experiments on the SAVOIAS dataset further validate the model's generalization ability and stability. Ablation studies further confirm the rationality and effectiveness of key modules such as the complexity-aware encoding block and the cross-domain interaction module. Qualitative analysis indicates that the heatmaps generated by IGLCFN can more accurately focus on human visual perception regions.

摘要: 图像复杂度预测在视觉认知和计算机视觉领域具有重要研究意义。现有方法在有效模拟人类视觉系统、平衡模型复杂度与效率,以及生成可解释的像素级热力图等方面仍面临挑战。针对这些问题,现提出一种可解释的全局-局部复杂度融合网络(Interpretable Global-Local Complexity Fusion Network, IGLCFN),IGLCFN主要由三个关键模块构成:复杂度感知编码块、跨域交互模块,以及可解释图像复杂度热力图生成模块。复杂度感知编码块采用双分支结构,融合了Vision Transformers的全局语义建模能力和卷积神经网络的局部微观建模能力,从而模拟人类视觉系统对图像复杂度的多层次感知过程。跨域交互模块充分考虑了不同分支提取特征的特性,并负责对齐Transformer序列特征与卷积神经网络空间特征。此外,可解释图像复杂度热力图生成模块通过构建基于局部感知分数累加的伪标签数据集进行监督学习,从而生成符合人类视觉系统感知且具有可解释性的像素级热力图。在IC9600数据集上的定量实验结果显示,IGLCFN在所有关键指标上均取得了当前最优性能。相比于多种主流基线、图像质量及图像复杂度预测模型,IGLCFN在保持较低计算资源占用的同时,实现了最高的预测性能。此外,在SAVOIAS数据集上的实验也验证了模型的泛化能力和稳定性。消融实验进一步证实了复杂度感知编码块和跨域交互模块等关键模块的设计合理性和有效性。定性分析表明,IGLCFN生成的热力图能够更精准地聚焦人类视觉感知区域。