Facial Expression Recognition Based on Anti-Aliasing Residual Attention Network

doi:10.19678/j.issn.1000-3428.0065224

Abstract

Abstract:

As it is difficult to extract effective features in facial expression recognition and the high similarity between categories and easy confusion lead to low accuracy of facial expression recognition, a facial expression recognition method based on anti-aliasing residual attention network is proposed. First, in view of the problem that the traditional subsampling method can easily cause the loss of expression discriminative features, an anti-aliasing residual network is constructed to improve the feature extraction ability of expression images and enhance the representation of expression features, enabling more effective global facial expression information to be extracted.At the same time, the improved channel attention mechanism and label smoothing regularization strategy are used to enhance the attention to the local key expression regions of the face: the improved channel attention focuses on the highly discriminative expression features and suppresses the weight of non-expressive regions, so as to locate more detailed local expression regions in the global information extracted by the network, and the label smoothing technology corrects the prediction probability by increasing the amount of information of the decision-making expression category, avoiding too absolute prediction results, which reduces misjudgment between similar expressions. Experimental results show that, the recognition accuracies of this method on the facial expression datasets RAF-DB and FERPlus reach 88.14% and 89.31%, respectively.Compared with advanced methods such as DACT and VTFF, this method has better performance. Compared with the original residual network, the accuracy and robustness of facial expression recognition are effectively improved.

Key words: facial expression recognition, residual network, anti-aliasing, label smoothing, attention mechanism

摘要：

针对人脸表情识别中难以提取有效特征，以及类别之间相似性高、易混淆导致人脸表情识别准确率下降的问题，提出一种基于抗混叠残差注意力网络的人脸表情识别方法。针对传统降采样方法易造成表情判别性特征丢失的不足，构建抗混叠残差网络来改善对表情图像的特征提取能力，加强表情特征的表征，从而提取更加有效的人脸表情全局信息。同时，利用改进的通道注意力机制和标签平滑的正则化策略来加强对人脸局部关键表情区域的关注，其中改进的通道注意力专注于区分性较高的表情特征，抑制非表情区域的权重，从而在网络提取的全局信息中定位更加细节的局部表情区域，标签平滑技术则通过增加决策表情类别的信息量对预测概率进行修正，避免过于绝对的预测结果，从而减少相似表情之间的误判。实验结果表明，该方法在人脸表情数据集RAF-DB和FERPlus上的识别准确率分别达到88.14%和89.31%，与DACL、VTFF等其他先进方法相比识别性能更优，相较于原始残差网络有效提升了人脸表情识别准确率和鲁棒性。

关键词: 人脸表情识别, 残差网络, 抗混叠, 标签平滑, 注意力机制

Fangyu FENG, Xiaoshu LUO, Zhiming MENG, Guangyu WANG. Facial Expression Recognition Based on Anti-Aliasing Residual Attention Network[J]. Computer Engineering, 2023, 49(8): 190-198.

丰芳宇, 罗晓曙, 蒙志明, 王广宇. 基于抗混叠残差注意力网络的人脸表情识别[J]. 计算机工程, 2023, 49(8): 190-198.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0065224

http://www.ecice06.com/EN/Y2023/V49/I8/190

Figures/Tables 15

References 35

1	LOWE D G. Object recognition from local scale-invariant features[C]//Proceedings of the 7th IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2002: 1150-1157.
2	OJALA T, PIETIKAINEN M, MAENPAA T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): 971- 987. doi: 10.1109/TPAMI.2002.1017623
3	DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2005: 886-893.
4	SHI Y, LV Z, BI N, et al. An improved SIFT algorithm for robust emotion recognition under various face poses and illuminations. Neural Computing and Applications, 2020, 32(13): 9267- 9281. doi: 10.1007/s00521-019-04437-w
5	MISTRY K, JASEKAR J, ISSAC B, et al. Extended LBP based facial expression recognition system for adaptive AI agent behaviour[C]//Proceedings of International Joint Conference on Neural Networks. Washington D. C., USA: IEEE Press, 2018: 1-7.
6	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 770-778.
7	GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks. Communications of the ACM, 2020, 63(11): 139- 144. doi: 10.1145/3422622
8	姚乃明, 郭清沛, 乔逢春, 等. 基于生成式对抗网络的鲁棒人脸表情识别. 自动化学报, 2018, 44(5): 865- 877. doi: 10.16383/j.aas.2018.c170477
	YAO N M, GUO Q P, QIAO F C, et al. Robust facial expression recognition with generative adversarial networks. Acta Automatica Sinica, 2018, 44(5): 865- 877. doi: 10.16383/j.aas.2018.c170477
9	ZOU M, YOU M B, AKASHI T. Reconstruction of partially occluded facial image for classification. IEEJ Transactions on Electrical and Electronic Engineering, 2021, 16(4): 600- 608. doi: 10.1002/tee.23335
10	PAN B W, WANG S F, XIA B. Occluded facial expression recognition enhanced through privileged information[C]//Proceedings of the 27th ACM International Conference on Multimedia. New York, USA: ACM Press, 2019: 566-573.
11	YOVEL G, DUCHAINE B. Specialized face perception mechanisms extract both part and spacing information: evidence from developmental prosopagnosia. Journal of Cognitive Neuroscience, 2006, 18(4): 580- 593. doi: 10.1162/jocn.2006.18.4.580
12	LI Y, ZENG J B, SHAN S G, et al. Patch-gated CNN for occlusion-aware facial expression recognition[C]//Proceedings of the 24th International Conference on Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 2209-2214.
13	LI Y, ZENG J B, SHAN S G, et al. Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Transactions on Image Processing, 2019, 28(5): 2439- 2450. doi: 10.1109/TIP.2018.2886767
14	WANG K, PENG X, YANG J, et al. Region attention networks for pose and occlusion robust facial expression recognition. IEEE Transactions on Image Processing, 2020, 29(1): 4057- 4069.
15	DING H, ZHOU P, CHELLAPPA R. Occlusion-adaptive deep network for robust facial expression recognition[C]//Proceedings of IEEE International Joint Conference on Biometrics. Washington D. C., USA: IEEE Press, 2021: 1-9.
16	王军, 赵凯, 程勇. 基于遮挡感知卷积神经网络的面部表情识别模型. 计算机工程, 2021, 47(10): 242- 251. URL
	WANG J, ZHAO K, CHENG Y. Facial expression recognition model based on convolutional neural network with occlusion perception. Computer Engineering, 2021, 47(10): 242- 251. URL
17	ZENG J, SHAN S, CHEN X. Facial expression recognition with inconsistently annotated datasets[C]//Proceedings of the 15th European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 227-243.
18	WANG K, PENG X J, YANG J F, et al. Suppressing uncertainties for large-scale facial expression recognition[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 6896-6905.
19	CHEN Y J, LIU S G. Deep partial occlusion facial expression recognition via improved CNN[C]//Proceedings of International Symposium on Visual Computing. Berlin, Geramny: Springer, 2020: 451-462.
20	RUAN L, HAN Y, SUN J, et al. Facial expression recognition in facial occlusion scenarios: a path selection multi-network. Displays, 2022, 74, 102245. doi: 10.1016/j.displa.2022.102245
21	LUCEY P, COHN J F, KANADE T, et al. The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2010: 94-101.
22	LYONS M, AKAMATSU S, KAMACHI M, et al. Coding facial expressions with Gabor wavelets[C]//Proceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition. Washington D. C., USA: IEEE Press, 2002: 200-205.
23	ZHANG R. Making convolutional networks shift-invariant again[C]//Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: International Machine Learning Society, 2019: 12712-12722.
24	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 3-19.
25	SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 2818-2826.
26	LI S, DENG W H, DU J P. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 2584-2593.
27	BARSOUM E, ZHANG C, FERRER C C, et al. Training deep networks for facial expression recognition with crowd-sourced label distribution[C]//Proceedings of the 18th ACM International Conference on Multimodal Interaction. New York, USA: ACM Press, 2016: 279-283.
28	GOODFELLOW I J, ERHAN D, CARRIER P L, et al. Challenges in representation learning: a report on three machine learning contests[C]//Proceedings of the 20th International Conference on Neural Information Processing. Berlin, Germany: Springer, 2013: 117-124.
29	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: [s. n. ], 2015: 1-10.
30	LI S, DENG W H. Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society, 2019, 28(1): 356- 370. doi: 10.1109/TIP.2018.2868382
31	FARZANEH A H, QI X J. Facial expression recognition in the wild via deep attentive center loss[C]//Proceedings of IEEE Winter Conference on Applications of Computer Vision. Washington D. C., USA: IEEE Press, 2021: 2401-2410.
32	HUANG C. Combining convolutional neural networks for emotion recognition[C]//Proceedings of IEEE MIT Undergraduate Research Technology Conference. Washington D. C., USA: IEEE Press, 2018: 1-4.
33	GEORGESCU M I, IONESCU R T, POPESCU M. Local learning with deep and handcrafted features for facial expression recognition. IEEE Access, 2019, 7, 64827- 64836. doi: 10.1109/ACCESS.2019.2917266
34	MA F Y, SUN B, LI S T. Facial expression recognition with visual transformers and attentional selective fusion[J/OL]. IEEE Transactions on Affective Computing: 1-9[2022-07-26]. https://ieeexplore.ieee.org/document/9585378.
35	SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 2020, 128(2): 336- 359.

网络	参数量/10⁶	识别准确率/%
VGG13	9.41	85.17
VGG16	14.73	85.56
VGG19	20.04	86.41
ResNet18	11.18	86.44
ResNet34	21.29	86.54
ResNet50	23.52	86.21

网络	参数量/10⁶	识别准确率/%
VGG13	9.41	85.17
VGG16	14.73	85.56
VGG19	20.04	86.41
ResNet18	11.18	86.44
ResNet34	21.29	86.54
ResNet50	23.52	86.21

方法	识别准确率
DLP-CNN^[30]	84.13
gACNN^[13]	85.07
IPA2LT^[17]	86.77
SCN^[18]	87.03
DACL^[31]	87.78
本文方法	88.14

方法	识别准确率
DLP-CNN^[30]	84.13
gACNN^[13]	85.07
IPA2LT^[17]	86.77
SCN^[18]	87.03
DACL^[31]	87.78
本文方法	88.14

方法	识别准确率
VGG13(+PLD)^[27]	85.36
ResNet+VGG^[32]	87.40
Local Learning Deep+BOW^[33]	87.76
RAN (ResNet18)^[14]	88.55
VTFF^[34]	88.81
本文方法	89.31

Please choose a citation manager

Content to export