作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (11): 63-71. doi: 10.19678/j.issn.1000-3428.0069690

• 人工智能与模式识别 • 上一篇    下一篇

基于实例谱关系的知识蒸馏

张政秀, 周淳, 杨萌*()   

  1. 西南交通大学信息科学与技术学院, 四川 成都 611756
  • 收稿日期:2024-04-03 修回日期:2024-06-11 出版日期:2025-11-15 发布日期:2024-08-08
  • 通讯作者: 杨萌
  • 基金资助:
    航空科学基金(2023Z071109001); 中央高校基本科研业务费专项资金(2682023ZTPY021); 中央高校基本科研业务费专项资金(2682023ZTPY027)

Knowledge Distillation Based on Instance Spectral Relations

ZHANG Zhengxiu, ZHOU Chun, YANG Meng*()   

  1. School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, Sichuan, China
  • Received:2024-04-03 Revised:2024-06-11 Online:2025-11-15 Published:2024-08-08
  • Contact: YANG Meng

摘要:

知识蒸馏(KD)的核心挑战在于从教师模型中提取普适且充足的知识, 以有效引导学生模型学习。最近的研究发现在学习软标签的基础上, 进一步学习深度特征空间中的实例关系有助于提升学生模型的性能。现有基于实例关系的KD方法广泛采用全局欧氏距离度量实例间的亲疏关系。然而, 这些方法忽视了深度特征空间内在的高维嵌入特性, 即数据实际上分布在低维流形上, 其局部结构与欧氏空间相似但整体结构复杂。为此, 提出一种基于实例谱关系的KD方法。该方法摒弃了全局欧氏距离的局限性, 转而通过构建并分析教师模型特征空间中每个实例与其k近邻的相似矩阵, 以揭示潜在的谱图结构信息。设计一种新的损失函数, 该函数能够引导学生模型不仅学习教师模型输出的概率分布, 而且能够更加精细地模拟这种谱图结构表示的实例间关系。实验结果表明, 所提方法显著提升了学生模型的性能, 平均分类准确率相较于基准方法提升了2.33百分点, 这证明了在KD过程中纳入样本间谱图结构关系的重要性及有效性。

关键词: 知识蒸馏, 注意力转移, 实例谱关系, 谱图结构, 流形学习

Abstract:

The core challenge of Knowledge Distillation (KD) lies in extracting generic and sufficient knowledge from the Teacher model to effectively guide the learning of the Student model. Recent studies have found that building upon learning soft labels, further exploration of inter-instance relations in the deep feature space contributes to enhancing the performance of Student models. Existing inter-instance relation-based KD methods widely adopt global Euclidean distance metrics to measure the affinity between instances. However, these methods overlook the intrinsic high-dimensional embedding characteristics of the deep feature space, where data is distributed on a low-dimensional manifold, exhibiting locally Euclidean-like structures but with complex global structures. To address this issue, a novel instance spectrum relation-based KD method is proposed. This strategy eliminates the limitations of the global Euclidean distance and instead constructs and analyzes similarity matrices between each instance and its k-nearest neighbor in the Teacher model's feature space to reveal potential spectral graph structure information. An innovative loss function is designed to guide the Student model to learn not only the probability distribution output by the Teacher model but also simulate the inter-instance relation represented by this spectral graph structure. The experimental results demonstrate that the proposed method significantly improves the performance of the Student model, with an average classification accuracy improvement of 2.33 percentage points compared with baseline methods. These findings strongly indicate the importance and effectiveness of incorporating the spectral graph structure relation between samples in the KD process.

Key words: Knowledge Distillation (KD), attention transfer, instance spectral relation, spectral graph structure, manifold learning