基于特征可视化探究跳跃连接结构对深度神经网络特征提取的影响

doi:10.19678/j.issn.1000-3428.0068885

摘要/Abstract

摘要： 由于没有跳跃连接结构的深度神经网络在超过一定深度后难以训练，近期的深度神经网络模型大都采用跳跃连接结构来解决优化问题和提高泛化性能。然而，人们对于跳跃连接结构如何影响深度神经网络特征提取的理解还较少，大多数情况下，这些模型仍然被认为是黑盒。为了分析跳跃连接结构对深度神经网络特征提取的影响，本文从特征可视化的角度出发，以基于扰动的方法为切入点，提出了一种在保持图像总体颜色分布和轮廓特征基本不变的前提下弱化图像细节特征的扰动方法，并将其命名为网格乱序模糊方法。同时，研究结合特征可视化中的激活最大化方法和所提出的网格乱序模糊扰动方法，分析了拥有不同程度跳跃连接结构的经典图像分类深度神经网络模型VGG19, ResNet50和DenseNet201，结果表明：第一，没有跳跃连接结构的深度神经网络只提取了图像中较强的特征，提取的特征数量比较少，而拥有跳跃连接结构的深度神经网络提取了图像中更多的特征，但是这些特征相对较弱；第二，跳跃连接结构让模型更关注图像的局部颜色分布和全局总体轮廓，而不过多依赖图像细节特征，并且跳跃连接结构越密集，这种趋势越强。

Abstract: Due to the difficulty in training deep neural networks without skip connection structures when they exceed a certain depth, skip connection structures have been integrated into most of the recent deep neural network models to address optimization issues and enhance generalization performance. However, our understanding of how skip connection structures affect the feature extraction in deep neural networks is still limited, and in most cases, these models are still considered as “black boxes”. In order to analyze the impact of skip connection structures on feature extraction in deep neural networks, this paper focuses on perturbation-based methods and introduces a method named Grid-shuffled Blurring, whose aim is to reduce the fine-grained details within an image while maintaining its overall color distribution and contour characteristics as much as possible. Meanwhile, this paper employs the Activation Maximization method in feature visualization and the Grid-shuffled Blurring perturbation method to analyze classic deep neural network models like VGG19, ResNet50 and DenseNet201 in image classification tasks, which are with different levels of skip connection structures. The experimental results show that firstly, neural networks without skip connection structures only extract stronger features from images, resulting in fewer extracted features, while neural networks with skip connection structures extract more features from images, albeit relatively weaker ones. Secondly, skip connection structures make the models focus more on local color distribution and global contours of images, rather than relying too much on detailed features on images, and the more skip connection structures there are, the stronger this trend becomes.

郭佩林, 张德, 王怀秀. 基于特征可视化探究跳跃连接结构对深度神经网络特征提取的影响[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0068885.

GUO Peilin, ZHANG De, WANG Huaixiu. Exploring the impact of skip connection structures on the feature extraction in deep neural networks via feature visualization[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0068885.

参考文献

[1] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C] //Proceedings of International Conference on Learning Representations. San Diego, CA, USA: IEEE Press, 2015: 2242-2251.
[2] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C] //Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE Computer Society, 2016: 770-778.
[3] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[4] YAN C, ZHANG H, LI X, et al. R-SSD: Refined single shot multi-box detector for pedestrian detection[J]. Applied Intelligence, 2022, 52(9): 10430-10447.
[5] LIU Q, KORTYLEWSKI A, ZHANG Z, et al. Learning part segmentation through unsupervised domain adaptation from synthetic vehicles[C] //Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA: IEEE Computer Society, 2022: 19118-19129.
[6] PENG D, LEI Y, HAYAT M, et al. Semantic-aware domain generalized segmentation[C] //Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA: IEEE Computer Society, 2022: 2584-2595.
[7] ZHOU E, XU X, XU B, et al. An enhancement model based on dense aurous and inception convolution for image semantic segmentation[J]. Applied Intelligence, 2023, 53(5): 5519-5531.
[8] 司念文, 张文林, 屈丹, 等. 卷积神经网络表征可视化研究综述 [J]. 自动化学报 , 2022, 48(08): 1890-1920. SI N W, ZHANG W L, QU D, et al. Representation visualization of convolutional neural networks: A survey[J]. Acta Automatica Sinical, 2022, 48(8): 1890-1892. (in Chinese)
[9] EHSAN U, WINTERSBERGER P, LIAO Q V, et al. Human-centered explainable AI: Beyond opening the black-box of AI[C] //Proceedings of International Conference on Human Factors in Computing Systems. Long Beach, CA, USA: ACM Press, 2022: 1009-1020.
[10] GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks[C] //Proceedings of International Conference on Artificial Intelligence and Statistics. Sardinia, Italy: Microtome Publishing, 2010: 249-256.
[11] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C] //Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE Computer Society, 2017: 2261-2269.
[12] MOHAMED E, SIRLANTZIS K, HOWELLS G. A review of visualization and explanation techniques for convolutional neural networks and their evaluation[J]. Displays, 2022, 73(5): 1245-1258.
[13] NGUYEN A, YOSINSKI J, CLUNE J. Understanding neural networks via feature visualization: A Survey [M]. Cambridge, USA: MIT Press, 2019.
[14] OYEDOTUN O K, EL RAHMAN SHABAYEK A, AOUADA D, et al. Training very deep networks viaresidual learning with stochastic input shortcut connections[C] //Proceedings of International Conference on Neural Information Processing. Guangzhou, China: Springer Verlag, 2017: 23-33.
[15] OYEDOTUN O K, ISMAEIL K A, AOUADA D. Why is everyone training very deep neural network with skip connections?[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(9): 5961-5975.
[16] IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C] //Proceedings of International Conference on Machine Learning. Lile, France: IEEE Press, 2015: 448-456.
[17] CHEN Y, LI J, XIAO H, et al. Dual path networks[C] //Proceedings of Annual Conference on Neural Information Processing Systems. Long Beach, CA, USA: NIPS Foundation, 2017: 4468-4476.
[18] ZHANG X, LI Z, LOY C C, et al. PolyNet: A pursuit of structural diversity in very deep networks[C] //Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE Computer Society, 2017: 3900-3908.
[19] SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, Inception-ResNet and the impact of residual connections on learning[C] //Proceedings of AAAI Conference on Artificial Intelligence. San Francisco, CA, USA: AAAI Press, 2017: 4278-4284.
[20] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C] //Proceedings of Annual Conference on Neural Information Processing Systems. Long Beach, CA, USA: NIPS Foundation, 2017: 5999-6009.
[21] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An Image is worth 16×16 words: Transformers for image recognition at scale[C] //Proceedings of International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2021: 5278-5284.
[22] DAI D, LI Y, WANG Y, et al. Rethinking the image feature biases exhibited by deep convolutional neural network models in image recognition[J]. CAAI Transactions on Intelligence Technology, 2022, 7(4): 721-731.
[23] FONG R C, VEDALDI A. Interpretable explanations of black boxes by meaningful perturbation[C] //Proceedings of IEEE International Conference on Computer Vision. Venice, Italy: IEEE Press, 2017: 3449-3457.
[24] FONG R, PATRICK M, VEDALDI A. Understanding deep networks via extremal perturbations and smooth masks[C] //Proceedings of IEEE International Conference on Computer Vision. Seoul, Republic of Korea: IEEE Press, 2019: 2950-2958.v [25] ZEILER M D, FERGUS R. Visualizing and understanding convolutional networks[C] //Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer Verlag, 2014: 818-833.
[26] SMILKOV D, THORAT N, KIM B, et al. SmoothGrad: Removing noise by adding noise[J]. IEEE Transactions on Multimedia, 2019, 20(8): 2323-2334.
[27] SUNDARARAJAN M, TALY A, YAN Q. Axiomatic attribution for deep networks[C] //Proceedings of International Conference on Machine Learning. Sydney, NSW, Australia: IEEE Press, 2017: 5109-5118.
[28] KIM B, SEO J, JEON S, et al. Why are saliency maps noisy solution to noisy saliency maps[C] //Proceedings of IEEE International Conference on Computer Vision Workshops. Seoul, Republic of Korea: IEEE Press, 2019: 4149-4157.
[29] GU J, YANG Y, TRESP V. Understanding individual decisions of CNNs via contrastive backpropagation[C] //Proceedings of Asian Conference on Computer Vision. Perth, WA, Australia: Springer Verlag, 2019: 119-134.
[30] IWANA B K, KUROKI R, UCHIDA S. Explaining convolutional neural networks using softmax gradient layer-wise relevance propagation[C] //Proceedings ofIEEE International Conference on Computer Vision Workshops. Seoul, Republic of Korea: IEEE Press, 2019: 4176-4185.
[31] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020, 128(2): 336-359.
[32] SHI T, LI Y, LIANG H, et al. Score-CAM: Class activation map based on logarithmic transformation[C] //Proceedings of IEEE International Conference on Signal Processing. Beijing, China: IEEE Press, 2022: 256-259.
[33] MONTAVON G, LAPUSCHKIN S, BINDER A, et al. Explaining nonlinear classification decisions with deep Taylor decomposition[J]. Pattern Recognition, 2017, 65: 211-222.
[34] YOSINSKI J, CLUNE J, NGUYEN A, et al. Understanding neural networks through deep visualization[J]. Neural Networks, 2015, 34: 345-356.
[35] WANG F, LIU H, CHENG J. Visualizing deep neural network by alternately image blurring and deblurring[J]. Neural Networks, 2018, 97: 162-172.
[36] SHI R, LI T, YAMAGUCHI Y. Group visualization of class-discriminative features[J]. Neural Networks, 2020, 129: 75-90.
[37] KATZMANN A, TAUBMANN O, AHMAD S, et al. Explaining clinical decision support systems in medical imaging using cycle-consistent activation maximization[J]. Neurocomputing, 2021, 458: 141-156.
[38] MAHENDRAN A, VEDALDI A. Understanding deep image representations by inverting them[C] //Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE Computer Society, 2015: 5188-5196.
[39] DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C] //Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. Miami, FL, USA: IEEE Computer Society, 2009: 248-255.

选择文件类型/文献管理软件名称

选择包含的内容