融合关键区域注意力机制的人脸表情识别方法

doi:10.19678/j.issn.1000-3428.0069661

摘要/Abstract

摘要：

在现实环境中, 由于检测人脸图像时受到光照强弱、面部遮挡、姿势变化等因素的影响, 人脸表情识别的准确率通常不高。为解决这一鲁棒性问题, 提出一种融合关键区域注意力机制的人脸表情识别方法。根据人脸视觉系统的面部感知机制, 将人脸的关键区域与整体区域相融合, 增强对复杂及微妙表情的识别能力。在关键区域提取阶段, 采用MTCNN算法将人脸数据依次输入3个级联网络, 得到人脸关键点位置信息。根据人脸解剖学对面部的研究, 提出区域裁剪法(LRC), 对位置信息进行处理, 裁剪得到人脸关键区域图像。将人脸整体区域和裁剪得到的人脸关键区域图像分别输入ResNet-50网络并进行特征融合, 添加通过精确的位置信息以及对通道关系和长期依赖性进行编码的坐标注意力(CA)机制, 使得模型更关注人脸中对表情分类贡献更大的区域。在公开数据集CK+和FER2013上进行实验, 结果表明, 该方法的识别准确率分别达到了96.9%和73.22%, 与现有的诸多先进方法对比, 其准确率均有显著提高, 表明所提方法在网络结构和性能方面具有一定的参考价值。

关键词: 人脸表情识别, 关键区域, 人脸关键点, 位置信息, 坐标注意力机制

Abstract:

In real-world environments, the accuracy of facial expression recognition is typically low because of factors such as varying illumination intensities, facial occlusions, and pose variations during the detection of facial images. To address this robustness issue, this study proposes a facial expression recognition method that integrates a key region attention mechanism. Drawing inspiration from the facial perception mechanism of the human visual system, this method combines key facial regions with the overall facial region to enhance the recognition of complex and subtle expressions. During the key region extraction phase, the MTCNN algorithm is employed to sequentially feed facial data through three cascaded networks, thereby obtaining the positional information of facial keypoints. Based on anatomical studies of the face, this study introduces a Local Region Cropping (LRC) method to process the positional information and crop key facial region images. Subsequently, both the overall facial image and cropped key facial region images are separately input into a ResNet-50 network, followed by feature fusion. A Coordinate Attention (CA) mechanism, which encodes precise positional information, channel relationships, and long-range dependencies, is incorporated to direct the model's focus toward facial regions that contribute more significantly to expression classification. Experimental results on publicly available datasets, CK+ and FER2013, demonstrate that the proposed method achieves recognition accuracies of 96.9% and 73.22%, respectively. Compared with existing state-of-the-art methods, the method achieves significant improvements in accuracy, indicating that it offers valuable insights into network architecture and performance.

Key words: facial expression recognition, key regions, facial landmarks, location information, Coordinate Attention (CA) mechanism

彭奇, 刘银华, 尚云瑞. 融合关键区域注意力机制的人脸表情识别方法[J]. 计算机工程, 2025, 51(11): 258-267.

PENG Qi, LIU Yinhua, SHANG Yunrui. Facial Expression Recognition Method Integrating Attention Mechanism for Key Regions[J]. Computer Engineering, 2025, 51(11): 258-267.

https://www.ecice06.com/CN/Y2025/V51/I11/258

图/表 11

图1 本文人脸表情识别方法模型结构

Fig.1 The model structure of facial expression recognition method in this paper

图2 MTCNN模型结构

Fig.2 MTCNN model structure

图3 眼睛和嘴区域裁剪图

Fig.3 Crop image of eye and mouth area

图4 CA模块结构

Fig.4 CA module structure

图5 CK+数据集的7种表情示例

Fig.5 Seven examples of facial expressions in the CK+ dataset

图6 FER2013数据集的7种表情示例

Fig.6 Seven examples of facial expressions in the FER2013 dataset

图7 CK+的混淆矩阵

Fig.7 Confusion matrix of CK+

图8 FER2013的混淆矩阵

Fig.8 Confusion matrix of FER2013

参考文献 33

1	BARENTINE C, MCNAY A, PFAFFENBICHLER R, et al. A VR teleoperation suite with manipulation assist[C]//Proceedings of the Companion of the 2021 ACM/IEEE International Conference on Human-Robot Interaction. New York, USA: ACM Press, 2021: 442-446.
2	LI T H , DU C F , NAREN T Y , et al. Using feature points and angles between them to recognise facial expression by a neural network approach. IET Image Processing, 2018, 12 (11): 1951- 1955. doi: 10.1049/iet-ipr.2018.0009
3	BARTLETT M S, LITTLEWORT G, FASEL I, et al. Real time face detection and facial expression recognition: development and applications to human computer interaction[C]//Proceedings of the Conference on Computer Vision and Pattern Recognition Workshop. Washington D.C., USA: IEEE Press, 2003: 53.
4	EKMAN P. A technique for the measurement of facial action. Palo Alto[EB/OL]. [2024-01-05]. https://www.researchgate.net/publication/239537771_Facial_action_coding_system_A_technique_for_the_measurement_of_facial_movement.
5	OJALA T, PIETIKÄINEN M, MÄENPÄÄ T. Gray scale and rotation invariant texture classification with local binary patterns[EB/OL]. [2024-01-05]. https://link.springer.com/chapter/10.1007/3-540-45054-8_27.
6	DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). Washington D.C., USA: IEEE Press, 2005: 886-893.
7	LI Y, MA H, FANG C X, et al. Fusion of PHOG and Gabor for facial expression recognition[C]//Proceedings of the 7th International Conference on Intelligent Computing and Signal Processing (ICSP). Washington D.C., USA: IEEE Press, 2022: 968-971.
8	WOLD S , ESBENSEN K , GELADI P . Principal component analysis. Chemometrics and Intelligent Laboratory Systems, 1987, 2 (1/2/3): 37- 52.
9	HINTON G E , SALAKHUTDINOV R R . Reducing the dimensionality of data with neural networks. Science, 2006, 313 (5786): 504- 507. doi: 10.1126/science.1127647
10	KRIZHEVSKY A , SUTSKEVER I , HINTON G E . ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60 (6): 84- 90. doi: 10.1145/3065386
11	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2024-01-05]. https://arxiv.org/abs/1409.1556.
12	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 770-778.
13	DHANKHAR P . ResNet-50 and VGG-16 for recognizing facial emotions. International Journal of Innovations in Engineering and Technology (IJIET), 2019, 13 (4): 126- 130.
14	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. [2024-01-05]. https://arxiv.org/abs/1706.03762.
15	KONG Y H , REN Z H , ZHANG K , et al. Lightweight facial expression recognition method based on attention mechanism and key region fusion. Journal of Electronic Imaging, 2021, 30 (6): 06300.
16	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2018: 7132-7141.
17	JADERBERG M, SIMONYAN K, ZISSERMAN A. Spatial transformer networks[EB/OL]. [2024-01-05]. https://arxiv.org/abs/1506.02025.
18	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[EB/OL]. [2024-01-05]. https://link.springer.com/chapter/10.1007/978-3-030-01234-2_1.
19	HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2021: 13708-13717.
20	WIDEN S C , CHRISTY A M , HEWETT K , et al. Do proposed facial expressions of contempt, shame, embarrassment, and compassion communicate the predicted emotion?. Cognition & Emotion, 2011, 25 (5): 898- 906.
21	DUNCAN J , GOSSELIN F , COBARRO C , et al. Orientations for the successful categorization of facial expressions and their link with facial features. Journal of Vision, 2017, 17 (14): 7. doi: 10.1167/17.14.7
22	LIU H W , CAI H L , LIN Q C , et al. Adaptive multilayer perceptual attention network for facial expression recognition. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32 (9): 6253- 6266. doi: 10.1109/TCSVT.2022.3165321
23	冉瑞生, 翁稳稳, 王宁, 等. 基于人脸关键特征提取的表情识别. 计算机工程, 2023, 49 (2): 254- 262. doi: 10.19678/j.issn.1000-3428.0063715
	RAN R S , WENG W W , WANG N , et al. Expression recognition based on the extraction of key facial features. Computer Engineering, 2023, 49 (2): 254- 262. doi: 10.19678/j.issn.1000-3428.0063715
24	ZHANG K P , ZHANG Z P , LI Z F , et al. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 2016, 23 (10): 1499- 1503. doi: 10.1109/LSP.2016.2603342
25	李勇, 林小竹, 蒋梦莹. 基于跨连接LeNet-5网络的面部表情识别. 自动化学报, 2018, 44 (1): 176- 182.
	LI Y , LIN X Z , JIANG M Y . Facial expression recognition with cross-connect LeNet-5 network. Acta Automatica Sinica, 2018, 44 (1): 176- 182.
26	AGRAWAL A , MITTAL N . Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy. The Visual Computer, 2020, 36 (2): 405- 412. doi: 10.1007/s00371-019-01630-9
27	CHEN S K, WANG J F, CHEN Y D, et al. Label distribution learning on auxiliary label space graphs for facial expression recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2020: 13981-13990.
28	MOLLAHOSSEINI A, CHAN D, MAHOOR M H. Going deeper in facial expression recognition using deep neural networks[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV). Washington D.C., USA: IEEE Press, 2016: 1-10.
29	徐琳琳, 张树美, 赵俊莉. 构建并行卷积神经网络的表情识别算法. 中国图象图形学报, 2019, 24 (2): 227- 236.
	XU L L , ZHANG S M , ZHAO J L . Expression recognition algorithm for parallel convolutional neural networks. Journal of Image and Graphics, 2019, 24 (2): 227- 236.
30	陈加敏, 徐杨. 注意力金字塔卷积残差网络的表情识别. 计算机工程与应用, 2022, 58 (22): 123- 131.
	CHEN J M , XU Y . Expression recognition based on convolution residual network of attention pyramid. Computer Engineering and Applications, 2022, 58 (22): 123- 131.
31	罗思诗, 李茂军, 陈满. 多尺度融合注意力机制的人脸表情识别网络. 计算机工程与应用, 2023, 59 (1): 199- 206.
	LUO S S , LI M J , CHEN M . Multi-scale integrated attention mechanism for facial expression recognition network. Computer Engineering and Applications, 2023, 59 (1): 199- 206.
32	LIU C , HIROTA K , MA J J , et al. Facial expression recognition using hybrid features of pixel and geometry. IEEE Access, 2021, 9, 18876- 18889. doi: 10.1109/ACCESS.2021.3054332
33	JIA C, LI C L, YING Z. Facial expression recognition based on the ensemble learning of CNNs[C]//Proceedings of the IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC). Washington D.C., USA: IEEE Press, 2020: 1-5.

[1]	彭菊红, 张弛, 高谦, 张光明, 谈栋华, 赵明俊. 基于改进的YOLOv8算法的钢材缺陷检测[J]. 计算机工程, 2025, 51(7): 152-160.
[2]	杨竣辉, 李苏晋. 融合位置和实体类别信息的中文命名实体识别[J]. 计算机工程, 2025, 51(3): 113-121.
[3]	杨润, 陈艳平, 闫家鑫, 秦永彬. 基于关联邻接矩阵的关系抽取方法研究[J]. 计算机工程, 2025, 51(10): 121-129.
[4]	杨硕, 王一丁. 基于改进薄板样条运动模型的人脸动画算法[J]. 计算机工程, 2024, 50(6): 255-265.
[5]	李晶, 李健, 陈海丰, 张倩, 王丽燕, 裴二成. 基于关键区域遮挡与重建的人脸表情识别[J]. 计算机工程, 2024, 50(5): 241-249.
[6]	王安政, 党建武, 岳彪, 杨景玉. 基于位置信息和注意力机制的路面裂缝检测[J]. 计算机工程, 2024, 50(4): 303-312.
[7]	丰芳宇, 罗晓曙, 蒙志明, 王广宇. 基于抗混叠残差注意力网络的人脸表情识别[J]. 计算机工程, 2023, 49(8): 190-198.
[8]	冉瑞生, 翁稳稳, 王宁, 彭顺顺. 基于人脸关键特征提取的表情识别[J]. 计算机工程, 2023, 49(2): 254-262.
[9]	褚张晴晴, 钟志强, 颜子夜, 战荫伟. 基于特征融合与注意力机制的脑肿瘤分割算法[J]. 计算机工程, 2023, 49(10): 154-161.
[10]	郑伟鹏, 罗晓曙, 蒙志明. 基于改进轻量级秩扩展网络的人脸表情识别方法[J]. 计算机工程, 2022, 48(9): 189-196.
[11]	冯杨, 刘蓉, 鲁甜. 基于小尺度核卷积的人脸表情识别[J]. 计算机工程, 2021, 47(4): 262-267.
[12]	西尔艾力·色提, 艾山·吾买尔, 王路路, 吐尔根·依布拉音, 马喆康, 买合木提·买买提. 结合单词-字符引导注意力网络的中文旅游文本命名实体识别[J]. 计算机工程, 2021, 47(2): 39-45.
[13]	柳素红, 孙晓, 李春彬. 基于位置信息重建与时频域信息融合的脑电信号情感识别[J]. 计算机工程, 2021, 47(12): 95-102.
[14]	吴玉成, 李亮, 马云飞, 刘统. 基于用户位置信息的导频分配方法[J]. 计算机工程, 2020, 46(8): 164-171.
[15]	郑伟成, 李学伟, 刘宏哲, 代松银. 基于深度学习的疲劳驾驶检测算法[J]. 计算机工程, 2020, 46(7): 21-29.

选择文件类型/文献管理软件名称

选择包含的内容