基于实例谱关系的知识蒸馏

doi:10.19678/j.issn.1000-3428.0069690

摘要/Abstract

摘要：

知识蒸馏(KD)的核心挑战在于从教师模型中提取普适且充足的知识, 以有效引导学生模型学习。最近的研究发现在学习软标签的基础上, 进一步学习深度特征空间中的实例关系有助于提升学生模型的性能。现有基于实例关系的KD方法广泛采用全局欧氏距离度量实例间的亲疏关系。然而, 这些方法忽视了深度特征空间内在的高维嵌入特性, 即数据实际上分布在低维流形上, 其局部结构与欧氏空间相似但整体结构复杂。为此, 提出一种基于实例谱关系的KD方法。该方法摒弃了全局欧氏距离的局限性, 转而通过构建并分析教师模型特征空间中每个实例与其k近邻的相似矩阵, 以揭示潜在的谱图结构信息。设计一种新的损失函数, 该函数能够引导学生模型不仅学习教师模型输出的概率分布, 而且能够更加精细地模拟这种谱图结构表示的实例间关系。实验结果表明, 所提方法显著提升了学生模型的性能, 平均分类准确率相较于基准方法提升了2.33百分点, 这证明了在KD过程中纳入样本间谱图结构关系的重要性及有效性。

关键词: 知识蒸馏, 注意力转移, 实例谱关系, 谱图结构, 流形学习

Abstract:

The core challenge of Knowledge Distillation (KD) lies in extracting generic and sufficient knowledge from the Teacher model to effectively guide the learning of the Student model. Recent studies have found that building upon learning soft labels, further exploration of inter-instance relations in the deep feature space contributes to enhancing the performance of Student models. Existing inter-instance relation-based KD methods widely adopt global Euclidean distance metrics to measure the affinity between instances. However, these methods overlook the intrinsic high-dimensional embedding characteristics of the deep feature space, where data is distributed on a low-dimensional manifold, exhibiting locally Euclidean-like structures but with complex global structures. To address this issue, a novel instance spectrum relation-based KD method is proposed. This strategy eliminates the limitations of the global Euclidean distance and instead constructs and analyzes similarity matrices between each instance and its k-nearest neighbor in the Teacher model's feature space to reveal potential spectral graph structure information. An innovative loss function is designed to guide the Student model to learn not only the probability distribution output by the Teacher model but also simulate the inter-instance relation represented by this spectral graph structure. The experimental results demonstrate that the proposed method significantly improves the performance of the Student model, with an average classification accuracy improvement of 2.33 percentage points compared with baseline methods. These findings strongly indicate the importance and effectiveness of incorporating the spectral graph structure relation between samples in the KD process.

Key words: Knowledge Distillation (KD), attention transfer, instance spectral relation, spectral graph structure, manifold learning

张政秀, 周淳, 杨萌. 基于实例谱关系的知识蒸馏[J]. 计算机工程, 2025, 51(11): 63-71.

ZHANG Zhengxiu, ZHOU Chun, YANG Meng. Knowledge Distillation Based on Instance Spectral Relations[J]. Computer Engineering, 2025, 51(11): 63-71.

https://www.ecice06.com/CN/Y2025/V51/I11/63

图/表 11

图1 所提方法的总体框架

Fig.1 Overall structure of the proposed method

图2 基于SGL的KD示意图

Fig.2 SGL-based KD schematic diagram

图3 两种部署模式

Fig.3 Two deployment modes

图4 测试集准确率对比

Fig.4 Comparison of accuracy of testing set

图5 CIFAR-100-coarse数据集的混淆矩阵

Fig.5 Confusion matrix of CIFAR-100-coarse dataset

图6 CIFAR-10数据集的混淆矩阵

Fig.6 Confusion matrix of CIFAR-10 dataset

图7 CIFAR-10数据在教师模型不同Block层上的特征二维投影

Fig.7 Two-dimensional projections of features from different Block layers of the Teacher model on CIFAR-10 data

图8 SGL模型性能

Fig.8 SGL model performance

图9 SGL模型准确率随k的变化

Fig.9 Change of accuracy of the SGL model as k varies

参考文献 29

1	LUO J H, WU J X, LIN W Y. ThiNet: a filter level pruning method for deep neural network compression[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2017: 5068-5076.
2	JACOB B, KLIGYS S, CHEN B, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2018: 2704-2713.
3	YU X Y, LIU T L, WANG X C, et al. On compressing deep models by low rank and sparse decomposition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2017: 67-76.
4	HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[EB/OL]. [2024-03-07]. https://arxiv.org/abs/1503.02531v1.
5	PARK W, KIM D, LU Y, et al. Relational knowledge distillation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2019: 3962-3971.
6	ZAGORUYKO S, KOMODAKIS N. Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer[EB/OL]. [2024-03-07]. https://arxiv.org/abs/1612.03928v3.
7	KRIZHEVSKY A , SUTSKEVER I , HINTON G E . ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60 (6): 84- 90.
8	ZHOU B, LAPEDRIZA A, XIAO J, et al. Learning deep features for scene recognition using places database[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2014: 487-495.
9	SUN Y, CHEN Y, WANG X, et al. Deep learning face representation by joint identification-verification[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS'14). Cambridge, USA: MIT Press, 2014: 1988-1996.
10	LIU Y F, CAO J J, LI B, et al. Knowledge distillation via instance relationship graph[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2019: 7089-7097.
11	MINAMI S, HIRAKAWA T, YAMASHITA T, et al. Knowledge transfer graph for deep collaborative learning[C]//Proceedings of ACCV'20. Berlin, Germany: Springer, 2021: 203-217.
12	ZHANG Z Y, SHU X B, YU B W, et al. Distilling knowledge from well-informed soft labels for neural relation extraction[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2020: 9620-9627.
13	KOJI T, KANJI T. Dark reciprocal-rank: teacher-to-student knowledge transfer from self-localization model to graph-convolutional neural network[C]//Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Washington D.C., USA: IEEE Press, 2021: 1846-1853.
14	ROMERO A, BALLAS N, KAHOU S E, et al. FitNets: hints for thin deep nets[EB/OL]. [2024-03-07]. https://arxiv.org/abs/1412.6550v4.
15	ZHU Y S, ZHANG W, CHEN M Y, et al. DualDE: dually distilling knowledge graph embedding for faster and cheaper reasoning[C]//Proceedings of the 15th ACM International Conference on Web Search and Data Mining. New York, USA: ACM Press, 2022: 1516-1524.
16	CHEN Y X, CHEN P G, LIU S, et al. Deep structured instance graph for distilling object detectors[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 4339-4348.
17	ZHOU S, WANG Y C, CHEN D F, et al. Distilling holistic knowledge with graph neural networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 10367-10376.
18	PENG B Y, JIN X, LI D S, et al. Correlation congruence for knowledge distillation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2019: 5006-5015.
19	CHEN H T , WANG Y H , XU C , et al. Learning student networks via feature embedding. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32 (1): 25- 35.
20	MA A J , YUEN P C , ZOU W W W , et al. Supervised spatio-temporal neighborhood topology learning for action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 2013, 23 (8): 1447- 1460.
21	KRIZHEVSKY A, NAIR V, HINTON G. CIFAR-10 and CIFAR-100 datasets[EB/OL]. [2024-03-07]. https://www.cs.toronto.edu/kriz/cifar.html.
22	CHENG G , HAN J W , LU X Q . Remote sensing image scene classification: benchmark and state of the art. Proceedings of the IEEE, 2017, 105 (10): 1865- 1883.
23	XIAO H, RASUL K, VOLLGRAF R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms[EB/OL]. [2023-10-14]. https://arxiv.org/abs/1708.07747.
24	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 770-778.
25	LINDERMAN G, RACHH M, HOSKINS J, et al. Efficient algorithms for t-distributed stochastic neighborhood embedding[EB/OL]. [2024-03-07]. https://arxiv.org/abs/1712.09005.
26	曹坪, 杨怀志, 薄一军, 等. 面向低质量裂缝图像的多知识蒸馏分类. 计算机工程, 2023, 49 (7): 204- 213. doi: 10.19678/j.issn.1000-3428.0065323
	CAO P , YANG H Z , BO Y J , et al. Multi-knowledge distillation classification for low-quality crack images. Computer Engineering, 2023, 49 (7): 204- 213. doi: 10.19678/j.issn.1000-3428.0065323
27	刘静, 郑铜亚, 郝沁汾. 图知识蒸馏综述: 算法分类与应用分析. 软件学报, 2024, 35 (2): 675- 710.
	LIU J , ZHENG T Y , HAO Q F . Survey on knowledge distillation with graph: algorithms classification and application analysis. Journal of Software, 2024, 35 (2): 675- 710.
28	李大湘, 南艺璇, 刘颖. 面向遥感图像场景分类的双知识蒸馏模型. 电子与信息学报, 2023, 45 (10): 3558- 3567.
	LI D X , NAN Y X , LIU Y . A double knowledge distillation model for remote sensing image scene classification. Journal of Electronics & Information Technology, 2023, 45 (10): 3558- 3567.
29	赖轩, 曲延云, 谢源, 等. 基于拓扑一致性对抗互学习的知识蒸馏. 自动化学报, 2023, 49 (1): 102- 110.
	LAI X , QU Y Y , XIE Y , et al. Topology-guided adversarial deep mutual learning for knowledge distillation. Acta Automatica Sinica, 2023, 49 (1): 102- 110.

[1]	张玉博, 杨帆, 郭亚, 杨文慧. 基于视觉大模型的垃圾分类轻量化算法研究[J]. 计算机工程, 2025, 51(7): 140-151.
[2]	林烁彬, 蔡捷仪, 方晓城, 张正, 卢光明, 陈炳志. 基于强度相关正则化学习的对抗鲁棒蒸馏方法[J]. 计算机工程, 2025, 51(1): 42-50.
[3]	屠乃威, 焦猛, 阎馨. 复杂环境下输电线路鸟巢目标图像检测模型[J]. 计算机工程, 2024, 50(7): 216-226.
[4]	曹坪, 杨怀志, 薄一军, 尤嘉, 张淳杰, 李丹勇. 面向低质量裂缝图像的多知识蒸馏分类[J]. 计算机工程, 2023, 49(7): 204-213.
[5]	高小方, 原玉梁, 温静, 白雪飞. 面向相交多流形聚类的标签传播算法[J]. 计算机工程, 2023, 49(6): 90-98.
[6]	毛亮, 赵林均, 余敦辉, 孙斌. 基于知识蒸馏的企业命名实体识别模型[J]. 计算机工程, 2023, 49(5): 90-96.
[7]	詹健浩, 甘利鹏, 毕永辉, 曾鹏, 李晓潮. 基于知识蒸馏的多模态融合行为识别方法[J]. 计算机工程, 2023, 49(10): 280-288, 297.
[8]	李林珂, 康昭, 龙波. 基于黎曼流形的多视角谱聚类算法[J]. 计算机工程, 2023, 49(1): 113-120,129.
[9]	王士浩, 王中卿, 李寿山, 周国栋. 基于知识蒸馏与模型集成的事件论元抽取方法[J]. 计算机工程, 2022, 48(7): 97-103.
[10]	鲁统伟, 徐子昕, 闵锋. 基于生成对抗网络的知识蒸馏数据增强[J]. 计算机工程, 2022, 48(4): 70-80.
[11]	何涛, 俞舒曼, 徐鹤. 基于条件生成对抗网络与知识蒸馏的单幅图像去雾方法[J]. 计算机工程, 2022, 48(4): 165-172.
[12]	廖胜兰, 吉建民, 俞畅, 陈小平. 基于BERT模型与知识蒸馏的意图分类方法[J]. 计算机工程, 2021, 47(5): 73-79.
[13]	黄涛涛,顾晶晶,庄毅. 基于半监督拉普拉斯映射的移动定位算法[J]. 计算机工程, 2018, 44(1): 144-148,153.
[14]	梁金平,董唯光,毛向德. 变流器故障特征提取与维数约简方法研究[J]. 计算机工程, 2015, 41(12): 280-287.
[15]	龚劬,马家军. 基于改进二维保局投影算法的人脸识别[J]. 计算机工程, 2014, 40(9): 252-256.

选择文件类型/文献管理软件名称

选择包含的内容