Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (10): 284-294. doi: 10.19678/j.issn.1000-3428.0069519

• Graphics and Image Processing • Previous Articles     Next Articles

Hybrid Feature Facial Expression Recognition Model Based on DINO Prior

WANG Haojia1, DENG Yongjian1,*(), LIU Tingting2, YANG Zhen1   

  1. 1. Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
    2. School of Education, Hubei University, Wuhan 430072, Hubei, China
  • Received:2024-03-08 Revised:2024-05-27 Online:2025-10-15 Published:2024-08-06
  • Contact: DENG Yongjian

基于DINO先验的混合特征面部表情识别模型

王皓嘉1, 邓勇舰1,*(), 刘婷婷2, 杨震1   

  1. 1. 北京工业大学信息学部, 北京 100124
    2. 湖北大学教育学院, 湖北 武汉 430072
  • 通讯作者: 邓勇舰
  • 基金资助:
    国家重点研发计划(2022YFF0610000)

Abstract:

Facial Expression Recognition (FER) plays a crucial role in smart education. Current recognition systems depend heavily on single prior image features, are limited by the ineffective integration of multiple image features in FER tasks, and have poor generalizability in recognizing facial expressions under natural environmental conditions. This study utilizes the large-scale visual model DINOv2 as a pre-training model, with its pre-trained weights frozen, and leverages its learned experience from natural image datasets to acquire more universal image features, thereby enhancing the generalization performance of feature extraction. Furthermore, this study proposes a hybrid feature network-based FER model HFFER that utilizes two different pre-trained models to acquire distinct features and effectively integrates them through cross-attention mechanisms and multiple convolutions. Experimental results demonstrate that the model achieves accuracies of 92.18% and 66.76% on the RAF-DB and AffectNet datasets, respectively, surpassing or being comparable to existing models. This study introduces a novel approach to facial expression recognition, and its application to real classroom images demonstrates its feasibility and potential in practical educational settings.

Key words: Facial Expression Recognition (FER), pre-trained models, feature fusion, cross-attention mechanism, image classification

摘要:

面部表情识别(FER)在智慧教育领域具有重要意义。在FER任务中, 存在对单一先验图像特征的过度依赖,未能有效融合多种图像特征的问题,模型对自然环境中人脸表情识别泛化性差。为此,采用视觉大模型DINOv2作为预训练模型,在冻结其预训练权重的前提下,借助其在自然图像数据集中学到的经验,以获得更加通用的图像特征,从而提高特征提取的泛化性能。此外,设计一种基于混合特征网络的FER模型HFFER,利用两种不同的预训练模型获取不同的特征,并通过交叉注意力机制和多重卷积进行融合。实验结果表明,该模型在RAF-DB和AffectNet数据集上分别取得了92.18%和66.76%的准确率,均优于或相当于现有模型。这一研究为FER提供了新的方法,同时在真实课堂图像中的应用展示了其在实际教育场景中的可行性和应用潜力。

关键词: 面部表情识别, 预训练模型, 特征融合, 交叉注意力机制, 图像分类