Hybrid Feature Facial Expression Recognition Model Based on DINO Prior

doi:10.19678/j.issn.1000-3428.0069519

Abstract

Abstract:

Facial Expression Recognition (FER) plays a crucial role in smart education. Current recognition systems depend heavily on single prior image features, are limited by the ineffective integration of multiple image features in FER tasks, and have poor generalizability in recognizing facial expressions under natural environmental conditions. This study utilizes the large-scale visual model DINOv2 as a pre-training model, with its pre-trained weights frozen, and leverages its learned experience from natural image datasets to acquire more universal image features, thereby enhancing the generalization performance of feature extraction. Furthermore, this study proposes a hybrid feature network-based FER model HFFER that utilizes two different pre-trained models to acquire distinct features and effectively integrates them through cross-attention mechanisms and multiple convolutions. Experimental results demonstrate that the model achieves accuracies of 92.18% and 66.76% on the RAF-DB and AffectNet datasets, respectively, surpassing or being comparable to existing models. This study introduces a novel approach to facial expression recognition, and its application to real classroom images demonstrates its feasibility and potential in practical educational settings.

Key words: Facial Expression Recognition (FER), pre-trained models, feature fusion, cross-attention mechanism, image classification

摘要：

面部表情识别(FER)在智慧教育领域具有重要意义。在FER任务中, 存在对单一先验图像特征的过度依赖，未能有效融合多种图像特征的问题，模型对自然环境中人脸表情识别泛化性差。为此，采用视觉大模型DINOv2作为预训练模型，在冻结其预训练权重的前提下，借助其在自然图像数据集中学到的经验，以获得更加通用的图像特征，从而提高特征提取的泛化性能。此外，设计一种基于混合特征网络的FER模型HFFER，利用两种不同的预训练模型获取不同的特征，并通过交叉注意力机制和多重卷积进行融合。实验结果表明，该模型在RAF-DB和AffectNet数据集上分别取得了92.18%和66.76%的准确率，均优于或相当于现有模型。这一研究为FER提供了新的方法，同时在真实课堂图像中的应用展示了其在实际教育场景中的可行性和应用潜力。

关键词: 面部表情识别, 预训练模型, 特征融合, 交叉注意力机制, 图像分类

WANG Haojia, DENG Yongjian, LIU Tingting, YANG Zhen. Hybrid Feature Facial Expression Recognition Model Based on DINO Prior[J]. Computer Engineering, 2025, 51(10): 284-294.

王皓嘉, 邓勇舰, 刘婷婷, 杨震. 基于DINO先验的混合特征面部表情识别模型[J]. 计算机工程, 2025, 51(10): 284-294.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0069519

https://www.ecice06.com/EN/Y2025/V51/I10/284

Figures/Tables 13

Fig.1 Overall structure of HFFER model

Fig.2 Cross-attention mechanism

Fig.3 Dataset examples

Fig.4 Visualization results of various model features

Fig.5 Impact of different scale of DINOv2 models on performance

Fig.6 The impact of the number of fusion modules on the model accuracy

Fig.7 Examples of facial expression recognition in authentic teaching scenarios

References 46

1	DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2005: 886-893.
2	OJALA T, PIETIKAINEN M, MAENPAA T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): 971- 987. doi: 10.1109/TPAMI.2002.1017623
3	HE K M, ZHANG X Y, REN S Q, et al. deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 770-778.
4	VASWANI A, SHAZEER N M, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 6000-6010.
5	谢虹, 姜文刚. RRA-InceptionV3结合鲁棒稀疏表示的表情识别方法. 计算机工程, 2023, 49(7): 196- 203. doi: 10.19678/j.issn.1000-3428.0064914
	XIE H, JIANG W G. RRA-InceptionV3 combined robust sparse representation method for expression recognition. Computer Engineering, 2023, 49(7): 196- 203. doi: 10.19678/j.issn.1000-3428.0064914
6	丰芳宇, 罗晓曙, 蒙志明, 等. 基于抗混叠残差注意力网络的人脸表情识别. 计算机工程, 2023, 49(8): 190- 198. doi: 10.19678/j.issn.1000-3428.0065224
	FENG F Y, LUO X S, MENG Z M, et al. Facial expression recognition based on anti-aliasing residual attention network. Computer Engineering, 2023, 49(8): 190- 198. doi: 10.19678/j.issn.1000-3428.0065224
7	ZHU J, LUO B, YANG T, et al. Knowledge conditioned variational learning for one-class facial expression recognition. IEEE Transactions on Image Processing, 2023, 32, 4010- 4023. doi: 10.1109/TIP.2023.3293775
8	ZHENG C, MATIAS M, CHEN C. POSTER: a pyramid cross-fusion transformer network for facial expression recognition[C]//Proceedings of IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Washington D.C., USA: IEEE Press, 2022: 3138-3147.
9	KIRILLOV A, MINTUN E, RAVI N, et al. Segment anything[C]//Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2023: 3992-4003.
10	HE K M, CHEN X L, XIE S N, et al. Masked autoencoders are scalable vision learners[C]//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 15979-15988.
11	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[C]//Proceedings of the 9th International Conference on Learning Representations(ICLR). [S. l. ]: AAAI Press, 2021: 12-18.
12	OQUAB M, DARCET T, MOUTAKANNI T, et al. DINOv2: learning robust visual features without supervision[EB/OL]. [2024-02-05]. https://openreview.net/forum?id=a68SUt6zFt.
13	DENG J K, GUO J, YANG J, et al. ArcFace: additive angular margin loss for deep face recognition[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2018: 4685-4694.
14	XUE F L, WANG Q C, GUO G D. TransFER: learning relation-aware facial expression representations with Transformers[C]//Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 3581-3590.
15	ZHAO G Y, PIETIKAINEN N. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2007, 29(6): 915- 928.
16	SHAN C F, GONG S G, MCOWAN P W. Facial expression recognition based on local binary patterns: a comprehensive study. Image and Vision Computing, 2009, 27(6): 803- 816. doi: 10.1016/j.imavis.2008.08.005
17	SAVCHENKO A V. Facial expression and attributes recognition based on multi-task learning of lightweight neural networks[C]//Proceedings of the 19th International Symposium on Intelligent Systems and Informatics (SISY). Washington D.C., USA: IEEE Press, 2021: 119-124.
18	HOWARD A G, ZHU M L, CHEN B, et al. MobileNet: efficient convolutional neural networks for mobile vision applications[EB/OL]. [2024-02-05]. https://arxiv.org/pdf/1704.04861.
19	TAN M X, LE Q V. EfficientNet: rethinking model scaling for convolutional neural networks[EB/OL]. [2024-02-05]. https://arxiv.org/pdf/1905.11946.
20	HAN D, YUN S, HEO B, et al. ReXNet: diminishing representational bottleneck on convolutional neural network[EB/OL]. [2024-02-05]. https://api.semanticscholar.org/CorpusID:220302239.
21	王军, 赵凯, 程勇. 基于遮挡感知卷积神经网络的面部表情识别模型. 计算机工程, 2021, 47(10): 242- 251. doi: 10.19678/j.issn.1000-3428.0059166
	WANG J, ZHAO K, CHENG Y. Facial expression recognition model based on convolutional neural network with occlusion perception. Computer Engineering, 2021, 47(10): 242- 251. doi: 10.19678/j.issn.1000-3428.0059166
22	SANG D V, CUONG L T B, HA T P, et al. Discriminative deep feature learning for facial emotion recognition[C]//Proceedings of the 1st International Conference on Multimedia Analysis and Pattern Recognition (MAPR). Washington D.C., USA: IEEE Press, 2018: 1-6.
23	VO T H, LEE G S, YANG H J, et al. Pyramid with super resolution for in-the-wild facial expression recognition. IEEE Access, 2020, 8, 131988- 132001. doi: 10.1109/ACCESS.2020.3010018
24	冉瑞生, 翁稳稳, 王宁, 等. 基于人脸关键特征提取的表情识别. 计算机工程, 2023, 49(2): 254- 262. doi: 10.19678/j.issn.1000-3428.0063715
	RAN R S, WENG W W, WANG N, et al. Expression recognition based on the extraction of key facial features. Computer Engineering, 2023, 49(2): 254- 262. doi: 10.19678/j.issn.1000-3428.0063715
25	HUANG Y F, TSAI C H. PIDViT: pose-invariant distilled vision transformer for facial expression recognition in the wild. IEEE Transactions on Affective Computing, 2023, 14, 3281- 3293. doi: 10.1109/TAFFC.2022.3220972
26	CHANG D, YIN Y F, LI Z J, et al. LibreFace: an open-source Toolkit for deep facial expression analysis[C]//Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision. Washington D.C., USA: IEEE Press, 2024: 8190-8200.
27	蓝峥杰, 王烈, 聂雄. 一种基于词频-逆文档频率和混合损失的表情识别算法. 计算机工程, 2023, 49(1): 295-302, 310. doi: 10.19678/j.issn.1000-3428.0063455
	LAN Z J, WANG L, NIE X. An expression recognition algorithm based on term frequency-inverse document frequency and hybrid loss. Computer Engineering, 2023, 49(1): 295-302, 310. doi: 10.19678/j.issn.1000-3428.0063455
28	MA F Y, SUN B, LI S T. Facial expression recognition with visual transformers and attentional selective fusion. IEEE Transactions on Affective Computing, 2021, 14, 1236- 1248.
29	NAKAMURA F, MURAKAMI M, SUZUKI K, et al. Analyzing the effect of diverse gaze and head direction on facial expression recognition with photo-reflective sensors embedded in a head-mounted display. IEEE Transactions on Visualization and Computer Graphics, 2023, 29(10): 4124- 4139. doi: 10.1109/TVCG.2022.3179766
30	KHAN F. Facial expression recognition using facial landmark detection and feature extraction via neural networks[EB/OL]. [2024-02-05]. https://arxiv.org/pdf/1812.04510.
31	李晶, 李健, 陈海丰, 等. 基于关键区域遮挡与重建的人脸表情识别. 计算机工程, 2024, 50(5): 241- 249. doi: 10.19678/j.issn.1000-3428.0067538
	LI J, LI J, CHEN H F, et al. Facial expression recognition based on key region masking and reconstruction. Computer Engineering, 2024, 50(5): 241- 249. doi: 10.19678/j.issn.1000-3428.0067538
32	CARON M, TOUVRON H, MISRA I, et al. Emerging properties in self-supervised vision transformers[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 9630-9640.
33	谢斌, 刘阳倩, 李俞霖. 结合极化自注意力和Transformer的结直肠息肉分割方法. 光电工程, 2024, 51(10): 240179.
	XIE B, LIU Y Q, LI Y L. Colorectal polyp segmentation method combining polarized self-attention and Transformer. Opto-Electronic Engineering, 2024, 51(10): 240179.
34	LI S, DENG W H, DU J P, et al. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2017: 2584-2593.
35	MOLLAHOSSEINI A, HASANI B, MAHOOR M H. AffectNet: a database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing, 2017, 10(1): 18- 31.
36	KINGMA D P, BA J J C. Adam: a method for stochastic optimization[EB/OL]. [2024-02-05]. https://arxiv.org/abs/1412.69802014,abs/1412.6980.
37	CHEN S K, WANG J F, CHEN Y D, et al. Label distribution learning on auxiliary label space graphs for facial expression recognition[C]//Proceeding of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2020: 13981-13990.
38	FARZANEH A H, QI X J. Facial expression recognition in the wild via deep attentive center loss[C]//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). Washington D.C., USA: IEEE Press, 2021: 2401-2410.
39	LI H Y, WANG N N, DING X P, et al. Adaptively learning facial expression representation via C-F labels and distillation. IEEE Transactions on Image Processing, 2021, 30, 2016- 2028. doi: 10.1109/TIP.2021.3049955
40	SHE J H, HU Y B, SHI H L, et al. Dive into ambiguity: latent distribution mining and pairwise uncertainty estimation for facial expression recognition[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2021: 6244-6253.
41	ZENG D, LIN Z K, YAN X, et al. Face2Exp: combating data biases for facial expression recognition[C]//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 20259-20268.
42	ZHANG Y, WANG C, LING X, et al. Learn from all: erasing attention consistency for noisy label facial expression recognition[EB/OL]. [2024-02-05]. https://arxiv.org/pdf/2207.10299.
43	WRIGHT L. Ranger-a synergistic optimizer[EB/OL]. [2024-02-05]. https://github.com/lessw2020/Ranger-Deep-LearningOptimizer.
44	FAN Y, WANG T, WANG X F. Student classroom behavior detection based on YOLOv7-BRA and multi-model fusion[EB/OL]. [2024-02-05]. https://arxiv.org/pdf/2305.07825.
45	FAN Y, WANG X F. Student classroom behavior detection based on improved YOLOv7[EB/OL]. [2024-02-05]. https://arxiv.org/pdf/2306.03318.
46	ROSEBROCK A. Face recognition with OpenCV, Python, and deep learning[EB/OL]. [2024-02-05]. https://github.com/ageitgey/face_recognition.

[1]	MA Yue, HUANG Zhourui, ZHOU Wen, XU Yihan. Lightweight Forest Fire Detection Algorithm Based on Receptive Field Attention [J]. Computer Engineering, 2025, 51(9): 350-361.
[2]	CHEN Xiaolei, WANG Rong. Multi-Branch and Multi-Scale Point Cloud Completion Network [J]. Computer Engineering, 2025, 51(8): 330-340.
[3]	MA Manfu, CHEN Jiahao, LI Yong, ZHANG Cong. Multi-Feature Fusion Rumor Detection Model MFLAN Based on Improved Graph Attention Network [J]. Computer Engineering, 2025, 51(8): 181-189.
[4]	YAN Jianhong, LIU Zhiyan, WANG Zhen. Multi-Scale Convolutional Vehicle Trajectory Prediction Integrating Spatiotemporal Attention Mechanism [J]. Computer Engineering, 2025, 51(8): 406-414.
[5]	LIU Chunxia, MENG Jixing, PAN Lihu, GONG Dali. Remote Sensing Small-Target Detection Method with Fusion of RGB and IR Images [J]. Computer Engineering, 2025, 51(7): 326-338.
[6]	LUAN Mengna, ZHENG Qiumei, WANG Fenghua. Real-time Traffic Sign Detection Algorithm Based on DMC-YOLO [J]. Computer Engineering, 2025, 51(7): 90-99.
[7]	SHA Yuyang, LU Jingtao, DU Haofan, ZHAI Xiaobing, MENG Weiyu, LIAN Xu, LUO Gang, LI Kefeng. Lightweight Road Image Segmentation Algorithm Based on Multi-Scale Feature Fusion for Blind Guiding Scenarios [J]. Computer Engineering, 2025, 51(7): 314-325.
[8]	ZHOU Sha, CHE Shengbing, KAO Youchen, ZHANG Xu, GUO Shenyi. Network Intrusion Detection Based on Feature Selection and Spatio-Temporal Features [J]. Computer Engineering, 2025, 51(7): 223-231.
[9]	LI Yi, XU Huiying, ZHU Xinzhong, HUANG Xiao, WANG Shumeng, LI Xiyu. Mask-YOLO: Improved Mask Detection Algorithm Based on YOLOv5n [J]. Computer Engineering, 2025, 51(6): 297-310.
[10]	LI Baiya. CNN-Transformer-Based Lesion and Organ Segmentation Network for Electronic Laryngoscope [J]. Computer Engineering, 2025, 51(6): 327-337.
[11]	LIU Kai, REN Hongyi, LI Ying, JI Yi, LIU Chunping. Medical Visual Question Answering Based on Cross-Modal Attention Feature Enhancement [J]. Computer Engineering, 2025, 51(6): 49-56.
[12]	ZHENG Cheng, LI Pengfei. Text Classification Based on Feature Fusion of Dual Hypergraph Neural Networks [J]. Computer Engineering, 2025, 51(6): 127-135.
[13]	CAO Bei, ZHAO Kui. Dual Emotion and Multi-feature Fusion Based Fake News Detection [J]. Computer Engineering, 2025, 51(6): 193-203.
[14]	XU Yonggang, SUN Qixuan, LI Fanjia, CHENG Jianwei, DAI Jiajun. Skeleton Behavior Recognition Based on Extended Temporal and Spatiotemporal Feature Fusion Graph Convolutional Network [J]. Computer Engineering, 2025, 51(4): 281-292.
[15]	DU Chenyang, ZHANG Xueying, HUANG Lixia, LI Juan. Multi-Feature Speech Emotion Recognition Based on Improved Efficient Channel Attention Mechanism [J]. Computer Engineering, 2025, 51(4): 97-106.

Please choose a citation manager

Content to export